Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala 如何连接数据帧(来自数据集集合)?_Scala_Apache Spark_Apache Spark Sql - Fatal编程技术网

Scala 如何连接数据帧(来自数据集集合)?

Scala 如何连接数据帧(来自数据集集合)?,scala,apache-spark,apache-spark-sql,Scala,Apache Spark,Apache Spark Sql,我正在搜索并找出加入Spark数据帧的最佳方式 示例列表(df1、df2、df3、dfN)其中所有df都有我可以加入的日期 递归?像这样: List(df1,df2,df3,dfN).reduce((a, b) => a.join(b, joinCondition)) 我正在为pyspark用户编写与上面相同的答案 from functools import reduce from pyspark.sql.functions import coalesce dfslist #list o

我正在搜索并找出加入Spark数据帧的最佳方式

示例
列表(df1、df2、df3、dfN)
其中所有
df
都有我可以加入的日期

递归?

像这样:

List(df1,df2,df3,dfN).reduce((a, b) => a.join(b, joinCondition))

我正在为pyspark用户编写与上面相同的答案

from functools import reduce
from pyspark.sql.functions import coalesce
dfslist #list of all dataframes that you want to join
mergedDf = reduce(lambda df1,df2 : df1.join(df2, [df1.joinKey == df2.joinKey ], "outer").select("*", coalesce(df1.joinKey, df2.joinKey).alias("joinKey")).drop(df1.joinKey ).drop(df2.joinKey ), dfslist )

我已经用递归完成了。但这似乎很清楚
def recursiveJoinOnDate(list:list[DataFrame]):DataFrame={if(list.isEmpty){null}else if(list.size>1){list.head.join(recursiveJoinOnDate(list.tail),“Date”)}else list.head}