用Scala API调用替换Spark SQL
我有一个简单的左连接用Scala API调用替换Spark SQL,scala,apache-spark,apache-spark-sql,Scala,Apache Spark,Apache Spark Sql,我有一个简单的左连接 spark.sql( s""" |SELECT | a.*, | b.company_id AS companyId |FROM profile_views a |LEFT JOIN companies_info b | ON a.memberId = b.member_id |""".stripMargin ).creat
spark.sql(
s"""
|SELECT
| a.*,
| b.company_id AS companyId
|FROM profile_views a
|LEFT JOIN companies_info b
| ON a.memberId = b.member_id
|""".stripMargin
).createOrReplaceTempView("company_views")
如何用scala API替换它?试试下面的代码
profile_viewsDF.as("a")
.join(
companies_infoDF.select($"company_id".as("companyId"),$"member_id".as("memberId")).as("b"),
Seq("memberId"),
"left"
)
.createOrReplaceTempView("company_views")
scala> val df = Seq(("srinivas",10)).toDF("name","age")
df: org.apache.spark.sql.DataFrame = [name: string, age: int]
scala> df.createTempView("person")
scala> spark.table("person").show
+--------+---+
| name|age|
+--------+---+
|srinivas| 10|
+--------+---+
下面的代码适用于temp视图
以及hive表
val profile_views = spark
.table("profile_views")
.as("a")
val companies_info = spark
.table("companies_info")
.select($"company_id".as("companyId"),$"member_id".as("memberId"))
.as("b")
profile_views
.join(companies_info,Seq("memberId"),"left")
.createOrReplaceTempView("company_views")
如果DataFrame中已经有数据,可以使用下面的代码
profile_viewsDF.as("a")
.join(
companies_infoDF.select($"company_id".as("companyId"),$"member_id".as("memberId")).as("b"),
Seq("memberId"),
"left"
)
.createOrReplaceTempView("company_views")
scala> val df = Seq(("srinivas",10)).toDF("name","age")
df: org.apache.spark.sql.DataFrame = [name: string, age: int]
scala> df.createTempView("person")
scala> spark.table("person").show
+--------+---+
| name|age|
+--------+---+
|srinivas| 10|
+--------+---+
更新:临时视图
也可以使用spark.table()
调用。请检查下面的代码
profile_viewsDF.as("a")
.join(
companies_infoDF.select($"company_id".as("companyId"),$"member_id".as("memberId")).as("b"),
Seq("memberId"),
"left"
)
.createOrReplaceTempView("company_views")
scala> val df = Seq(("srinivas",10)).toDF("name","age")
df: org.apache.spark.sql.DataFrame = [name: string, age: int]
scala> df.createTempView("person")
scala> spark.table("person").show
+--------+---+
| name|age|
+--------+---+
|srinivas| 10|
+--------+---+
这是左连接还是内连接?另外,如果我必须进行内部连接,调用会是什么样子?我已经提到了
left
内部连接。它将是``internal`profile\u view和companys\u info
都是临时视图-这仍然有效吗?从上面的问题中,他正在从profile\u view表中选择所有列&从companys\u info中选择一列,也需要加入栏目。如果您需要更多公司信息栏,只需在加入前选择即可。