Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何在pyspark数据帧中减去两个字符串列?_Python_Apache Spark_Pyspark_Apache Spark Sql_Pyspark Dataframes - Fatal编程技术网

Python 如何在pyspark数据帧中减去两个字符串列?

Python 如何在pyspark数据帧中减去两个字符串列?,python,apache-spark,pyspark,apache-spark-sql,pyspark-dataframes,Python,Apache Spark,Pyspark,Apache Spark Sql,Pyspark Dataframes,我想减去column1-column2,即从column1中删除column2中匹配的所有子字符串,并将结果放入新列result pyspark数据帧: +--+-------------------------+--------------------------+--------------+ |ID| column1 | column2 | result | +--+-----------------------

我想减去
column1-column2
,即从
column1
中删除
column2
中匹配的所有子字符串,并将结果放入新列
result

pyspark数据帧:

+--+-------------------------+--------------------------+--------------+
|ID|           column1       |   column2                | result       |
+--+-------------------------+--------------------------+--------------+
|1 | Hi how are you fine but | Hi I am fine how about u | are you but  |
|2 | javascript python XML   | python XML               | javascript   |
|3 | include all the inform  | include inform           | all the      |
+--+-------------------------+--------------------------+--------------+
您可以使用从
column1
中删除
colmun2
中存在的所有子字符串:

from pyspark.sql import functions as F

df1 = df.withColumn(
    "result",
    F.array_join(
        F.array_except(F.split("column1", " "), F.split("column2", " ")),
        " "
    )
)

df1.show(truncate=False)

#+---+-----------------------+------------------------+-----------+
#|ID |column1                |column2                 |result     |
#+---+-----------------------+------------------------+-----------+
#|1  |Hi how are you fine but|Hi I am fine how about u|are you but|
#|2  |javascript python XML  |python XML              |javascript |
#|3  |include all the inform |include inform          |all the    |
#+---+-----------------------+------------------------+-----------+

第2列中不匹配的字符串移到了结果中。。对于造成的混淆,我深表歉意。@blackishop``第2列中不匹配的字符串移到了结果中..为混淆道歉``@Blackbishop-很抱歉延迟答复实际上您回答了我的问题。