Apache spark 变换数组';分裂后的s元素
我有一个spark DF,有一个数组列Apache spark 变换数组';分裂后的s元素,apache-spark,pyspark,Apache Spark,Pyspark,我有一个spark DF,有一个数组列col1 +--------------------------+ |COL1 | | | +--------------------------- |[101:10001:A, 102:10002:B
col1
+--------------------------+
|COL1 |
| |
+---------------------------
|[101:10001:A, 102:10002:B]|
|[201:20001:B, 202:20002:A]|
对于数组中的所有元素,我将在上拆分:,如果拆分的最后一部分是A,则选择第一部分(例如101),否则选择无
预期产出:
+--------------------------+
|COL2 |
| |
+---------------------------
|[101, None] |
|[None, 202] |
代码:
我有一个例外
\nextraneous input '(' expecting {')', ','}(line 3, pos 70)\n\n== SQL ==\n\n transform(COL1,\n e -> when(split(e,':')[2] == 'A', split(e, ':')[0]).otherwise(None)\n----------------------------------------------------------------------^^^\n )\n \n"
Traceback (most recent call last):
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/functions.py", line 675, in expr
return Column(sc._jvm.functions.expr(str))
File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 73, in deco
raise ParseException(s.split(': ', 1)[1], stackTrace)
在expr()
函数中,您需要使用SQL语法,还必须重命名变量expr
,该变量将覆盖API函数expr
,并使其成为字符串。(如果不需要文本字符串None
,请在SQL中使用NULL
):
\nextraneous input '(' expecting {')', ','}(line 3, pos 70)\n\n== SQL ==\n\n transform(COL1,\n e -> when(split(e,':')[2] == 'A', split(e, ':')[0]).otherwise(None)\n----------------------------------------------------------------------^^^\n )\n \n"
Traceback (most recent call last):
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/functions.py", line 675, in expr
return Column(sc._jvm.functions.expr(str))
File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 73, in deco
raise ParseException(s.split(': ', 1)[1], stackTrace)
from pyspark.sql.functions import expr
expr1 = """
transform(COL1,
e -> IF(split(e,':')[2] = 'A', split(e,':')[0], "None")
)
"""
df.withColumn("COL2", expr(expr1)).show()