我的数据如下:
>>> df1.show()
+-----------------+--------------------+
| corruptNames| standardNames|
+-----------------+--------------------+
|Sid is (Good boy)| Sid is Good Boy|
| New York Life| New York Life In...|
+-----------------+--------------------+
因此,根据上述数据,我需要应用正则表达式,创建一个新列,然后像第二列一样获取数据,即standardNames
。我尝试了以下代码:
spark.sql("select *,case when corruptNames rlike '[^a-zA-Z ()]+(?![^(]*))' or corruptNames rlike 'standardNames' then standardNames else 0 end as standard from temp1").show()
它抛出以下错误:
pyspark.sql.utils.AnalysisException: "cannot resolve '`standardNames`' given input columns: [temp1.corruptNames,temp1. standardNames];