Python 如何在pyspark数据帧中减去两个字符串列?
我想减去Python 如何在pyspark数据帧中减去两个字符串列?,python,apache-spark,pyspark,apache-spark-sql,pyspark-dataframes,Python,Apache Spark,Pyspark,Apache Spark Sql,Pyspark Dataframes,我想减去column1-column2,即从column1中删除column2中匹配的所有子字符串,并将结果放入新列result pyspark数据帧: +--+-------------------------+--------------------------+--------------+ |ID| column1 | column2 | result | +--+-----------------------
column1-column2
,即从column1
中删除column2
中匹配的所有子字符串,并将结果放入新列result
pyspark数据帧:
+--+-------------------------+--------------------------+--------------+
|ID| column1 | column2 | result |
+--+-------------------------+--------------------------+--------------+
|1 | Hi how are you fine but | Hi I am fine how about u | are you but |
|2 | javascript python XML | python XML | javascript |
|3 | include all the inform | include inform | all the |
+--+-------------------------+--------------------------+--------------+
您可以使用从column1
中删除colmun2
中存在的所有子字符串:
from pyspark.sql import functions as F
df1 = df.withColumn(
"result",
F.array_join(
F.array_except(F.split("column1", " "), F.split("column2", " ")),
" "
)
)
df1.show(truncate=False)
#+---+-----------------------+------------------------+-----------+
#|ID |column1 |column2 |result |
#+---+-----------------------+------------------------+-----------+
#|1 |Hi how are you fine but|Hi I am fine how about u|are you but|
#|2 |javascript python XML |python XML |javascript |
#|3 |include all the inform |include inform |all the |
#+---+-----------------------+------------------------+-----------+
第2列中不匹配的字符串移到了结果中。。对于造成的混淆,我深表歉意。@blackishop``第2列中不匹配的字符串移到了结果中..为混淆道歉``@Blackbishop-很抱歉延迟答复实际上您回答了我的问题。