用Scala API调用替换Spark SQL

用Scala API调用替换Spark SQL,scala,apache-spark,apache-spark-sql,Scala,Apache Spark,Apache Spark Sql,我有一个简单的左连接 spark.sql( s""" |SELECT | a.*, | b.company_id AS companyId |FROM profile_views a |LEFT JOIN companies_info b | ON a.memberId = b.member_id |""".stripMargin ).creat

我有一个简单的左连接

spark.sql(
      s"""
         |SELECT
         |  a.*,
         |  b.company_id AS companyId
         |FROM profile_views a
         |LEFT JOIN companies_info b
         |  ON a.memberId = b.member_id
         |""".stripMargin
    ).createOrReplaceTempView("company_views")
如何用scala API替换它?

试试下面的代码

profile_viewsDF.as("a")
.join(
    companies_infoDF.select($"company_id".as("companyId"),$"member_id".as("memberId")).as("b"),
    Seq("memberId"),
    "left"
)
.createOrReplaceTempView("company_views")

scala> val df = Seq(("srinivas",10)).toDF("name","age")
df: org.apache.spark.sql.DataFrame = [name: string, age: int]

scala> df.createTempView("person")

scala> spark.table("person").show
+--------+---+
|    name|age|
+--------+---+
|srinivas| 10|
+--------+---+
下面的代码适用于
temp视图
以及
hive表

val profile_views = spark
.table("profile_views")
.as("a")

val companies_info = spark
.table("companies_info")
.select($"company_id".as("companyId"),$"member_id".as("memberId"))
.as("b")

profile_views
.join(companies_info,Seq("memberId"),"left")
.createOrReplaceTempView("company_views")

如果DataFrame中已经有数据,可以使用下面的代码

profile_viewsDF.as("a")
.join(
    companies_infoDF.select($"company_id".as("companyId"),$"member_id".as("memberId")).as("b"),
    Seq("memberId"),
    "left"
)
.createOrReplaceTempView("company_views")

scala> val df = Seq(("srinivas",10)).toDF("name","age")
df: org.apache.spark.sql.DataFrame = [name: string, age: int]

scala> df.createTempView("person")

scala> spark.table("person").show
+--------+---+
|    name|age|
+--------+---+
|srinivas| 10|
+--------+---+
更新
临时视图
也可以使用
spark.table()
调用。请检查下面的代码

profile_viewsDF.as("a")
.join(
    companies_infoDF.select($"company_id".as("companyId"),$"member_id".as("memberId")).as("b"),
    Seq("memberId"),
    "left"
)
.createOrReplaceTempView("company_views")

scala> val df = Seq(("srinivas",10)).toDF("name","age")
df: org.apache.spark.sql.DataFrame = [name: string, age: int]

scala> df.createTempView("person")

scala> spark.table("person").show
+--------+---+
|    name|age|
+--------+---+
|srinivas| 10|
+--------+---+

这是左连接还是内连接?另外,如果我必须进行内部连接,调用会是什么样子?我已经提到了
left
内部连接。它将是``internal`profile\u view和
companys\u info
都是临时视图-这仍然有效吗?从上面的问题中,他正在从profile\u view表中选择所有列&从companys\u info中选择一列,也需要加入栏目。如果您需要更多公司信息栏,只需在加入前选择即可。