Dataframe 如何将2个或更多数据帧与pyspark合并
我有3个pyspark df as,我需要通过合并所有df来单个df,如Dataframe 如何将2个或更多数据帧与pyspark合并,dataframe,pyspark,Dataframe,Pyspark,我有3个pyspark df as,我需要通过合并所有df来单个df,如 +--------+ | | | name | | | |--------| | orange | +--------+ +--------+ | | | age | | | |--------| | 10 | +--------+ +---------+ | | | place | |
+--------+
| |
| name |
| |
|--------|
| orange |
+--------+
+--------+
| |
| age |
| |
|--------|
| 10 |
+--------+
+---------+
| |
| place |
| |
|---------|
| delhi |
+---------+
输出df应该如下所示
+---------+---------+---------+
| | | |
| name | age | place |
| | | |
|---------+---------+---------+
| | | |
| orange | 10 | delhi |
| | | |
+---------+---------+---------+
有人知道解决办法吗
提前感谢对于这种情况,请使用
交叉连接
,行号
示例:
df=spark.createDataFrame([('orange',)],['name'])
df1=spark.createDataFrame([(10,)],['age'])
df2=spark.createDataFrame([('delhi',)],['place'])
df.crossJoin(df1).crossJoin(df2).show()
#+------+---+-----+
#| name|age|place|
#+------+---+-----+
#|orange| 10|delhi|
#+------+---+-----+
#using window
from pyspark.sql import *
from pyspark.sql.functions import *
w=Window.orderBy(lit(1))
df=df.withColumn("rn",row_number().over(w))
df1=df1.withColumn("rn",row_number().over(w))
df2=df2.withColumn("rn",row_number().over(w))
df.join(df1,['rn'],'inner').join(df2,['rn'],'inner').drop('rn').show()
#+------+---+-----+
#| name|age|place|
#+------+---+-----+
#|orange| 10|delhi|
#+------+---+-----+