Scala 如何在Map[String,Dataframe]中存储多个数据帧,并使用Map的键访问每个数据帧

Scala 如何在Map[String,Dataframe]中存储多个数据帧,并使用Map的键访问每个数据帧,scala,apache-spark,Scala,Apache Spark,我有多个数据帧,需要将它们存储在Map[String,Dataframe]数据结构中。下一个目标是访问它们以进行联接操作。以下是输入数据帧: names_df: +-----+----------+----------+ |Id |FirstName | LastName | +-----+----------+----------+ |1000 | Bob | B | |1001 | Alice | A | +-----+-----

我有多个数据帧,需要将它们存储在Map[String,Dataframe]数据结构中。下一个目标是访问它们以进行联接操作。以下是输入数据帧:

 names_df:
 +-----+----------+----------+
 |Id   |FirstName | LastName |
 +-----+----------+----------+
 |1000 | Bob      | B        |
 |1001 | Alice    | A        |
 +-----+----------+----------+

 addresses_df
 +----+----+----+
 |Id  |Address  |
 +----+---------+
 |1000|NY       |
 |1001|Boston   |
 +----+---------+  
我创建了一张地图,如下所示:

import org.apache.spark.sql.{DataFrame,Dataset}

var  map_DFs =Map.empty[String,DataFrame] 
map_DFs += ("Names" -> names_df)
map_DFs += ("Addresses" -> addresses_df)
我正在尝试加载这些数据帧,然后使用以下代码将其连接起来:

var person_df =  map_DFs("Names")
person_df =  person_df.join(map_DFs("Addresses"), "Id", "left")
但是,结果是以下错误:

notebook: error: overloaded method value join with alternatives:
  (right: org.apache.spark.sql.Dataset[_],joinExprs: org.apache.spark.sql.Column,joinType: String)org.apache.spark.sql.DataFrame <and>
  (right: org.apache.spark.sql.Dataset[_],usingColumns: Seq[String],joinType: String)org.apache.spark.sql.DataFrame
 cannot be applied to (org.apache.spark.sql.DataFrame, String, String)
     person_df =  person_df.join(map_DFs, "Id", "left")
notebook:错误:重载方法值与可选项联接:
(右:org.apache.spark.sql.Dataset[\ux],joinExprs:org.apache.spark.sql.Column,joinType:String)org.apache.spark.sql.DataFrame
(右:org.apache.spark.sql.Dataset[173],使用columns:Seq[String],joinType:String)org.apache.spark.sql.DataFrame
无法应用于(org.apache.spark.sql.DataFrame,String,String)
person\u df=person\u df.join(映射“Id”、“left”)

不知您是否可以帮助我解决此问题。

您的方法调用无效,如果要指定联接类型,则必须提供联接列序列:

person_df =  person_df.join(map_DFs("Addresses"), Seq("Id"), "left")

方法调用无效,如果要指定联接类型,则必须提供联接列序列:

person_df =  person_df.join(map_DFs("Addresses"), Seq("Id"), "left")

联接键必须是列类型或字符串序列:

import org.apache.spark.sql.functions.col
person_df=person_df.join(映射DFs(“地址”)、col(“Id”)、left)
//或
导入spark.implicits_
person_df=person_df.join(映射“地址”),$“Id”,“left”)
//或
person_df=person_df.join(映射“地址”),Seq(“Id”),“left”)

联接键必须是列类型或字符串序列:

import org.apache.spark.sql.functions.col
person_df=person_df.join(映射DFs(“地址”)、col(“Id”)、left)
//或
导入spark.implicits_
person_df=person_df.join(映射“地址”),$“Id”,“left”)
//或
person_df=person_df.join(映射“地址”),Seq(“Id”),“left”)

如果有联接类型,则应将列名包装到scala Seq中。如果有联接类型,则应将列名包装到scala Seq中。