Dataframe 如何在pyspark数据帧上执行联接操作?

Dataframe 如何在pyspark数据帧上执行联接操作?,dataframe,pyspark,apache-spark-sql,Dataframe,Pyspark,Apache Spark Sql,我有两个数据帧dd1和dd2,我想加入这些数据帧 dd1: dd2:- 我希望在dd1数据帧中输出如下: id name name1 1 red banana 2 green apple 3 yellow NULL 4 black orange 5 pink NULL 6 blue NULL 7 white NULL 8 grey grapes 您可以尝试以下代码: df = spark.createDataFr

我有两个数据帧dd1和dd2,我想加入这些数据帧

dd1:

dd2:-

我希望在dd1数据帧中输出如下:

id name     name1
 1  red     banana
 2  green   apple
 3  yellow  NULL
 4  black   orange
 5  pink    NULL 
 6  blue    NULL
 7  white   NULL
 8  grey    grapes
您可以尝试以下代码:

df = spark.createDataFrame(
    [(1,'red'),(2,'green'),(3,'yellow'),(4,'black'),(5,'pink'),
    (6,'blue'),(7,'white'),(8,'grey')], ["id", "name"])

+---+------+
| id|  name|
+---+------+
|  1|   red|
|  2| green|
|  3|yellow|
|  4| black|
|  5|  pink|
|  6|  blue|
|  7| white|
|  8|  grey|
+---+------+

df1 = spark.createDataFrame(
    [(1,'banana'),(2,'apple'),(4,'orange'),(8,'grapes'),(9,'leamon')], ["id1", "name1"])

+---+------+
|id1| name1|
+---+------+
|  1|banana|
|  2| apple|
|  4|orange|
|  8|grapes|
|  9|leamon|
+---+------+

condition = [df.id ==df1.id1]
inner_join=df.join(df1,condition,how='left')

inner_join=inner_join.drop("id1")
inner_join=inner_join.orderBy("id")

display(inner_join) 

+---+------+------+
| id|  name| name1|
+---+------+------+
|  1|   red|banana|
|  2| green| apple|
|  3|yellow|  null|
|  4| black|orange|
|  5|  pink|  null|
|  6|  blue|  null|
|  7| white|  null|
|  8|  grey|grapes|
+---+------+------+

id name     name1
 1  red     banana
 2  green   apple
 3  yellow  NULL
 4  black   orange
 5  pink    NULL 
 6  blue    NULL
 7  white   NULL
 8  grey    grapes
df = spark.createDataFrame(
    [(1,'red'),(2,'green'),(3,'yellow'),(4,'black'),(5,'pink'),
    (6,'blue'),(7,'white'),(8,'grey')], ["id", "name"])

+---+------+
| id|  name|
+---+------+
|  1|   red|
|  2| green|
|  3|yellow|
|  4| black|
|  5|  pink|
|  6|  blue|
|  7| white|
|  8|  grey|
+---+------+

df1 = spark.createDataFrame(
    [(1,'banana'),(2,'apple'),(4,'orange'),(8,'grapes'),(9,'leamon')], ["id1", "name1"])

+---+------+
|id1| name1|
+---+------+
|  1|banana|
|  2| apple|
|  4|orange|
|  8|grapes|
|  9|leamon|
+---+------+

condition = [df.id ==df1.id1]
inner_join=df.join(df1,condition,how='left')

inner_join=inner_join.drop("id1")
inner_join=inner_join.orderBy("id")

display(inner_join) 

+---+------+------+
| id|  name| name1|
+---+------+------+
|  1|   red|banana|
|  2| green| apple|
|  3|yellow|  null|
|  4| black|orange|
|  5|  pink|  null|
|  6|  blue|  null|
|  7| white|  null|
|  8|  grey|grapes|
+---+------+------+