Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/powershell/13.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala 在列中删除重复项_Scala_Apache Spark - Fatal编程技术网

Scala 在列中删除重复项

Scala 在列中删除重复项,scala,apache-spark,Scala,Apache Spark,我可以消除第3列和第4列中的多个值吗 +--------+--------+--------+--------+ |Column_1|Column_2|Column_3|Column_4| +--------+--------+--------+--------+ | 1| x| abc| www| | 1| x| abc| sdf| | 1| x| abc| xyz| |

我可以消除第3列和第4列中的多个值吗

+--------+--------+--------+--------+
|Column_1|Column_2|Column_3|Column_4|
+--------+--------+--------+--------+
|       1|       x|     abc|     www|
|       1|       x|     abc|     sdf|
|       1|       x|     abc|     xyz|
|       1|       x|     def|     www|
|       1|       x|     def|     sdf|
|       1|       x|     def|     xyz|
+--------+--------+--------+--------+
预期产量

+--------+--------+--------+--------+
|Column_1|Column_2|Column_3|Column_4|
+--------+--------+--------+--------+
|       1|       x|     abc|     www|
|       1|       x|     def|     sdf|
|       1|       x|    null|     xyz|
+--------+--------+--------+--------+
使用df.dropDuplicates(第3列、第4列)

另外,复制自

val df1=Seq((1,“x”,“abc”),(1,“x”,“def”))。toDF(“Column_1”,“Column_2”,“Column_3”)>val df2=Seq((1,“x”,“xyz”),(1,“x”,“sdf”)。toDF(“Column_1”,“Column_2”,“Column_4”)>val df3=df1。连接(df2,Seq(“Column_1”,“Column_2”),“outer”)6.3.4.显示,df3.3.7 7 7 7 7 7 7 7 7 7 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7)xyz | df3.dropDuplicates(“column_3”,“column_4”)res68:org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]=[column_1:int,column_2:string…另外两个字段]大规模A>res68.res68.表级级级以上以上的显示.显示+---------------------------------------------------------------------------+------------------------------->表层表层表层表层表层表层表层表层表层表层表层表层表层表层表层表层表层表层表层表层表层表层表层表层表层表层表层表层表层表层>表层表层表层表层表层表层表层>表层表层表层表层>表层>表层>表层>表层>表层>表层>表层>表层>第1个第1个第2个柱,第3个柱,第3个柱,第4个柱,第4个4 4 4 4\124??????12455 5 5+++++---------------------------------------------------------------------------+---------------------------------------------------------------------------+---------------------------------------------------------------------------+-------------------------+--------------------------------------------------+---------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------+-------------------------+-------------------------+-------------------------+---------------------------------------------------------------------------+--------------------------------------------------+-------------------------+-------------------------+-------------------------+-------------------------+-------------------------+-------------------------+-------------------------+----------------------------------------------------------------------------------------------------+--------------------------------------------------+-------------------------+-------------------------+-------------------------+-------------------------+-------------------------+-------------------------+-------------------------+-------------------------+-------------------------+-------------------------+-------------------------+-------------------------+-------------------------+-------------------------+-------------------------+-------------------------+++++)abc | xyz |+----------------+-----------+尝试
df.dropDuplicates(数组(“列3”))
.Hi@Uservxn-欢迎使用SO:)两个问题:1。你的spark版本是什么?2.是否有任何规则可以保留
第3列
第4列
的组合,比如如何决定保留'abc|www`或
abc|sdf