Dataframe 如何在pyspark数据帧上执行联接操作?
我有两个数据帧dd1和dd2,我想加入这些数据帧 dd1: dd2:- 我希望在dd1数据帧中输出如下:Dataframe 如何在pyspark数据帧上执行联接操作?,dataframe,pyspark,apache-spark-sql,Dataframe,Pyspark,Apache Spark Sql,我有两个数据帧dd1和dd2,我想加入这些数据帧 dd1: dd2:- 我希望在dd1数据帧中输出如下: id name name1 1 red banana 2 green apple 3 yellow NULL 4 black orange 5 pink NULL 6 blue NULL 7 white NULL 8 grey grapes 您可以尝试以下代码: df = spark.createDataFr
id name name1
1 red banana
2 green apple
3 yellow NULL
4 black orange
5 pink NULL
6 blue NULL
7 white NULL
8 grey grapes
您可以尝试以下代码:
df = spark.createDataFrame(
[(1,'red'),(2,'green'),(3,'yellow'),(4,'black'),(5,'pink'),
(6,'blue'),(7,'white'),(8,'grey')], ["id", "name"])
+---+------+
| id| name|
+---+------+
| 1| red|
| 2| green|
| 3|yellow|
| 4| black|
| 5| pink|
| 6| blue|
| 7| white|
| 8| grey|
+---+------+
df1 = spark.createDataFrame(
[(1,'banana'),(2,'apple'),(4,'orange'),(8,'grapes'),(9,'leamon')], ["id1", "name1"])
+---+------+
|id1| name1|
+---+------+
| 1|banana|
| 2| apple|
| 4|orange|
| 8|grapes|
| 9|leamon|
+---+------+
condition = [df.id ==df1.id1]
inner_join=df.join(df1,condition,how='left')
inner_join=inner_join.drop("id1")
inner_join=inner_join.orderBy("id")
display(inner_join)
+---+------+------+
| id| name| name1|
+---+------+------+
| 1| red|banana|
| 2| green| apple|
| 3|yellow| null|
| 4| black|orange|
| 5| pink| null|
| 6| blue| null|
| 7| white| null|
| 8| grey|grapes|
+---+------+------+
id name name1
1 red banana
2 green apple
3 yellow NULL
4 black orange
5 pink NULL
6 blue NULL
7 white NULL
8 grey grapes
df = spark.createDataFrame(
[(1,'red'),(2,'green'),(3,'yellow'),(4,'black'),(5,'pink'),
(6,'blue'),(7,'white'),(8,'grey')], ["id", "name"])
+---+------+
| id| name|
+---+------+
| 1| red|
| 2| green|
| 3|yellow|
| 4| black|
| 5| pink|
| 6| blue|
| 7| white|
| 8| grey|
+---+------+
df1 = spark.createDataFrame(
[(1,'banana'),(2,'apple'),(4,'orange'),(8,'grapes'),(9,'leamon')], ["id1", "name1"])
+---+------+
|id1| name1|
+---+------+
| 1|banana|
| 2| apple|
| 4|orange|
| 8|grapes|
| 9|leamon|
+---+------+
condition = [df.id ==df1.id1]
inner_join=df.join(df1,condition,how='left')
inner_join=inner_join.drop("id1")
inner_join=inner_join.orderBy("id")
display(inner_join)
+---+------+------+
| id| name| name1|
+---+------+------+
| 1| red|banana|
| 2| green| apple|
| 3|yellow| null|
| 4| black|orange|
| 5| pink| null|
| 6| blue| null|
| 7| white| null|
| 8| grey|grapes|
+---+------+------+