Apache spark 在多个字段上连接两个Spark数据帧_Apache Spark_Dataframe_Join

Apache spark 在多个字段上连接两个Spark数据帧

apache-spark dataframe join

Apache spark 在多个字段上连接两个Spark数据帧,apache-spark,dataframe,join,Apache Spark,Dataframe,Join,我正在尝试在多个字段的Spark中连接两个数据帧。我试过这个： df1. join(df2, df1$col1 == df2$col2 && df1$col3 == df2$col4) 但这不起作用（有一系列错误，如果需要，我可以列出）有没有更好的方法写这个？我需要在Spark中执行此操作，而不是在pySpark中，等等。在pySpark中，我必须将包裹条件放入一组大括号中，因为操作优先级有问题也许你也有同样的问题： df1. join(df2, (df1$co

我正在尝试在多个字段的Spark中连接两个数据帧。我试过这个：

df1.
   join(df2, df1$col1 == df2$col2 && df1$col3 == df2$col4)

但这不起作用（有一系列错误，如果需要，我可以列出）

有没有更好的方法写这个？我需要在Spark中执行此操作，而不是在pySpark中，等等。

在pySpark中，我必须将包裹条件放入一组大括号中，因为操作优先级有问题

也许你也有同样的问题：

df1.
   join(df2, (df1$col1 == df2$col2) && (df1$col3 == df2$col4))

如果数据帧为df1和df2，则需要执行以下操作

df1.join(df2, (df1("col1") === df2("col2")) && (df1("col3") === df2("col4")))

希望这有帮助

以下内容对我很有用

result=(
        df1
        .join(
            df2,
            (df1.col1 == df2.col1) & 
            (df1.col2 == df2.col2) & 
            (df1.col3 == df2.col3),
            how="left"
        )

pyspark中的这个过程也对我有效。希望这是有帮助的

df1.join(df2, (df1["col1"]==df2["col2"]) & \
(df1["col3"]==df2["co4"]))

可能重复的