spark scala中两个数据帧的unequla记录的左外连接
我有两个数据帧。 数据帧一spark scala中两个数据帧的unequla记录的左外连接,scala,apache-spark,apache-spark-sql,spark-dataframe,Scala,Apache Spark,Apache Spark Sql,Spark Dataframe,我有两个数据帧。 数据帧一 +-------------+-------------------------+--------------+--------+----------+-----------------------+---------------------+-------------------+-----------------------+--------------------------+--------------------------+-----------+ |Da
+-------------+-------------------------+--------------+--------+----------+-----------------------+---------------------+-------------------+-----------------------+--------------------------+--------------------------+-----------+
|DataPartition|TimeStamp |OrganizationID|SourceID|_auditorId|sr:AuditorEnumerationId|sr:AuditorOpinionCode|sr:AuditorOpinionId|sr:IsPlayingAuditorRole|sr:IsPlayingCSRAuditorRole|sr:IsPlayingTaxAdvisorRole|FFAction|!||
+-------------+-------------------------+--------------+--------+----------+-----------------------+---------------------+-------------------+-----------------------+--------------------------+--------------------------+-----------+
|Japan |2018-05-03T09:52:48+00:00|4295876589 |195 |null |null |null |null |null |null |null |O|!| |
|Japan |2018-05-03T08:10:19+00:00|4295876589 |196 |null |null |null |null |null |null |null |D|!| |
|Japan |2018-05-03T09:52:48+00:00|4295876589 |194 |null |null |null |null |null |null |null |O|!| |
+-------------+-------------------------+--------------+--------+----------+-----------------------+---------------------+-------------------+-----------------------+--------------------------+--------------------------+-----------+
第二数据帧是
DataPartition TimeStamp OrganizationID SourceID _auditorId sr:AuditorEnumerationId sr:AuditorOpinionCode sr:AuditorOpinionId sr:IsPlayingAuditorRole sr:IsPlayingCSRAuditorRole sr:IsPlayingTaxAdvisorRole FFAction|!|
Japan 2018-05-03T08:06:06+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T08:06:06+00:00 4295876589 195 16157 1002485247 UWE 3010547 true false false O|!|
Japan 2018-05-03T09:48:33+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T09:48:33+00:00 4295876589 195 16157 1002485247 UWE 3010547 true false false O|!|
Japan 2018-05-03T07:27:10+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T07:27:10+00:00 4295876589 195 5937 3026578 NOP 3010543 true false true O|!|
Japan 2018-05-03T07:27:10+00:00 4295876589 196 3252 3024053 ONC 3020538 true false true O|!|
Japan 2018-05-03T07:35:42+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T07:35:42+00:00 4295876589 195 5937 3026578 NOP 3010543 true false true O|!|
Japan 2018-05-03T07:35:42+00:00 4295876589 196 3252 3024053 ONC 3020538 true false true O|!|
Japan 2018-05-03T09:34:46+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T09:34:46+00:00 4295876589 195 16157 1002485247 UWE 3010547 true false false O|!|
Japan 2018-05-03T08:10:19+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T08:10:19+00:00 4295876589 195 16157 1002485247 UWE 3010547 true false false O|!|
Japan 2018-05-03T07:28:16+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T07:28:16+00:00 4295876589 195 5937 3026578 NOP 3010543 true false true O|!|
Japan 2018-05-03T07:28:16+00:00 4295876589 196 3252 3024053 ONC 3020538 true false true O|!|
Japan 2018-05-02T09:05:04+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-02T09:05:04+00:00 4295876589 195 5937 3026578 NOP 3010543 true false true O|!|
Japan 2018-05-02T09:05:04+00:00 4295876589 196 3252 3024053 ONC 3020538 true false true O|!|
Japan 2018-05-03T07:31:28+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T07:31:28+00:00 4295876589 195 5937 3026578 NOP 3010543 true false true O|!|
Japan 2018-05-03T07:31:28+00:00 4295876589 196 3252 3024053 ONC 3020538 true false true O|!|
Japan 2018-05-03T07:22:58+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T07:22:58+00:00 4295876589 195 5937 3026578 NOP 3010543 true false true O|!|
Japan 2018-05-03T07:22:58+00:00 4295876589 196 3252 3024053 ONC 3020538 true false true O|!|
Japan 2018-05-03T09:45:22+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T09:45:22+00:00 4295876589 195 16157 1002485247 UWE 3010547 true false false O|!|
Japan 2018-05-03T07:11:26+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T07:11:26+00:00 4295876589 195 5937 3026578 NOP 3010543 true false true O|!|
Japan 2018-05-03T07:11:26+00:00 4295876589 196 3252 3024053 ONC 3020538 true false true O|!|
Japan 2018-05-03T07:00:45+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T07:00:45+00:00 4295876589 195 5937 3026578 NOP 3010543 true false true O|!|
Japan 2018-05-03T07:00:45+00:00 4295876589 196 3252 3024053 ONC 3020538 true false true O|!|
Japan 2018-05-03T07:36:47+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T07:36:47+00:00 4295876589 195 5937 3026578 NOP 3010543 true false true O|!|
Japan 2018-05-03T07:36:47+00:00 4295876589 196 3252 3024053 ONC 3020538 true false true O|!|
Japan 2018-05-03T07:01:52+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T07:01:52+00:00 4295876589 195 5937 3026578 NOP 3010543 true false true O|!|
Japan 2018-05-03T07:01:52+00:00 4295876589 196 3252 3024053 ONC 3020538 true false true O|!|
Japan 2018-05-02T10:28:22+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-02T10:28:22+00:00 4295876589 195 5937 3026578 NOP 3010543 true false true O|!|
Japan 2018-05-02T10:28:22+00:00 4295876589 196 3252 3024053 ONC 3020538 true false true O|!|
Japan 2018-05-03T09:52:48+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T09:52:48+00:00 4295876589 195 16157 1002485247 UWE 3010547 true false false O|!|
Japan 2018-05-03T09:41:09+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T09:41:09+00:00 4295876589 195 16157 1002485247 UWE 3010547 true false false O|!|
Japan 2018-05-02T10:30:32+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-02T10:30:32+00:00 4295876589 195 5937 3026578 NOP 3010543 true false true O|!|
Japan 2018-05-02T10:30:32+00:00 4295876589 196 3252 3024053 ONC 3020538 true false true O|!|
Japan 2018-05-03T06:56:32+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T06:56:32+00:00 4295876589 195 5937 3026578 NOP 3010543 true false true O|!|
Japan 2018-05-03T06:56:32+00:00 4295876589 196 3252 3024053 ONC 3020538 true false true O|!|
Japan 2018-05-03T07:05:04+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T07:05:04+00:00 4295876589 195 5937 3026578 NOP 3010543 true false true O|!|
Japan 2018-05-03T07:05:04+00:00 4295876589 196 3252 3024053 ONC 3020538 true false true O|!|
Japan 2018-05-03T09:38:59+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T09:38:59+00:00 4295876589 195 16157 1002485247 UWE 3010547 true false false O|!|
Japan 2018-05-03T07:08:14+00:00 4295876589 194 2719 3023331 AOP 3010542 true false true O|!|
Japan 2018-05-03T07:08:14+00:00 4295876589 195 5937 3026578 NOP 3010543 true false true O|!|
Japan 2018-05-03T07:08:14+00:00 4295876589 196 3252 3024053 ONC 3020538 true false true O|!|
现在我想添加数据框一二数据框的所有列,除了三列TimeStamp、OrganizationID和SourceID
不同的记录之外。
因此,在这种情况下,数据帧1记录不会添加到数据帧2。因为TimeStamp | OrganizationID | SourceID
列在两个数据帧中都是匹配的
只应添加一行SourceId为196的行
在这种情况下,左外联接有效吗?
当我这样做时,我会得到重复的列
因此,简而言之,基于数据框1中三列的匹配记录不会被添加,除非所有记录都将添加到数据框中您可以尝试
leftanti
连接,然后联合
df2
df1.join(df2, Seq("TimeStamp" ,"OrganizationID", "SourceID"), "leftanti").union(df2)
您的最终数据帧应该是什么样子?Si ti realyl os muhc effotr ot tyep teh titel properyl?不,如果我使用这个,我会得到重复的记录