Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用scala或spark sql从spark中的第一个表中选择不在第二个表中的值_Scala_Apache Spark - Fatal编程技术网

使用scala或spark sql从spark中的第一个表中选择不在第二个表中的值

使用scala或spark sql从spark中的第一个表中选择不在第二个表中的值,scala,apache-spark,Scala,Apache Spark,我有一个样本蜂巢/火花表如下: 行键 数据\u作为\u日期的\u 钥匙 价值 A. 20210121 关键1 价值1 A. 20210121 键2 价值2 A. 20210121 键3 价值3 B 20210121 关键1 价值1 B 20210121 键2 价值1 B 20210121 键3 价值2 B 20210121 关键4 价值3 C 20210121 关键1 价值2 您只需在两个数据帧上执行leftanti连接即可获得预期的输出 val df = Seq(("A"

我有一个样本蜂巢/火花表如下:

行键 数据\u作为\u日期的\u 钥匙 价值 A. 20210121 关键1 价值1 A. 20210121 键2 价值2 A. 20210121 键3 价值3 B 20210121 关键1 价值1 B 20210121 键2 价值1 B 20210121 键3 价值2 B 20210121 关键4 价值3 C 20210121 关键1 价值2
您只需在两个数据帧上执行leftanti连接即可获得预期的输出

 val df = Seq(("A","20210121","key1","value1"),("A","20210121","key2","value2"),("A","20210121","key3","value3"),("B","20210121","key1","value1"),("B","20210121","key2","value1"),("B","20210121","key3","value3"),("B","20210121","key4","value3"),("C","20210121","key1","value2"))
.toDF("row_key","data_as_of_date","key","value")

 val df1 = Seq(("A","20210121","key1","value1"),("A","20210121","key2","value2"),("B","20210121","key1","value1"),("B","20210121","key4","value3"),("C","20210121","key1","value2"))
.toDF("row_key","data_as_of_date","key","value")

 val outputdf = df.join(df1, Seq("row_key","data_as_of_date","key"),"leftanti")
 display(outputdf)
您可以看到如下输出: