我有一个带有1个数组列col1
的spark DF
+--------------------------+
|COL1 |
| |
+---------------------------
|[101:10001:A,102:10002:B]|
|[201:20001:B,202:20002:A]|
对于数组中的所有元素,我在上进行拆分:,并且如果拆分的最后一部分是A,则要选择第一部分(例如101),否则没有。
预期输出:
+--------------------------+
|COL2 |
| |
+---------------------------
|[101,None] |
|[None,202] |
代码:
expr = """
transform(COL1,e -> when(split(e,':')[2] == 'A',split(e,':')[0]).otherwise(None)
)
"""
df = df.withColumn("COL2",expr(expr))
我收到了ParseException
\nextraneous input '(' expecting {')',','}(line 3,pos 70)\n\n== SQL ==\n\n transform(COL1,\n e -> when(split(e,':')[0]).otherwise(None)\n----------------------------------------------------------------------^^^\n )\n \n"
Traceback (most recent call last):
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/functions.py",line 675,in expr
return Column(sc._jvm.functions.expr(str))
File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py",line 1257,in __call__
answer,self.gateway_client,self.target_id,self.name)
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py",line 73,in deco
raise ParseException(s.split(': ',1)[1],stackTrace)