在DataRicks中,使用scala将sql查询结果写入dataframe失败

在DataRicks中,使用scala将sql查询结果写入dataframe失败,scala,apache-spark,apache-spark-sql,databricks,Scala,Apache Spark,Apache Spark Sql,Databricks,只需在databricks中运行此spark sql查询即可: %sql select CONCAT(`tsArr[1]`,"-", `tsArr[0]`,"-", `tsArr[2]`," ", `tsArr[3]`) as time, cast (context._function as string) as funct, cast (context._param as string) as param, cast(context._value as string) as v

只需在databricks中运行此spark sql查询即可:

%sql
select CONCAT(`tsArr[1]`,"-", `tsArr[0]`,"-", `tsArr[2]`," ", `tsArr[3]`) as time,
  cast (context._function as string) as funct, 
  cast (context._param as string) as param, 
  cast(context._value as string) as value from clickstreamDF
  lateral view explode(Context) as context
这将产生:

time                funct   param           value
11-27-2017 08:20:33 Open    location        3424
11-27-2017 08:20:33 Open    Company Id      testinc
11-27-2017 08:20:33 Open    Channel Info    1
11-27-2017 08:20:33 Open    UserAgent       jack
11-27-2017 08:20:33 Open    Language        english
但是当我想将查询结果放入这样的数据帧中时

%scala    
val df_header = spark.sql(s"select CONCAT(`tsArr[1]`,"-", `tsArr[0]`,"-", `tsArr[2]`," ", `tsArr[3]`) as time,
  cast (context._function as string) as funct,
  cast (context._param as string) as param,
  cast(context._value as string) as value
  from clickstreamDF lateral view explode(Context) as context")

df_header.createOrReplaceTempView("clickstreamDF")
然后它失败了。它说:

错误:“)”应为,但找到字符串文字

我猜这和“-”和“”有关。我尝试过用“”和``替换或扩展,或者完全不使用“”,但没有结果。 我做错了什么

问候,


D.

为了避免括起整个Spark SQL字符串的引号(即
)与SQL语句中使用的引号之间的歧义,请对括起的引号使用三重引号(
)。您还需要删除包含那些
tsArr[]
s的
backticks
,如下例所示:

import org.apache.spark.sql.functions._
import spark.implicits._

case class CT(_function: String, _param: String, _value: String)

val clickstreamDF = Seq(
  (Seq("27", "11", "2017", "08:20:33"), Seq(CT("f1", "p1", "v1"), CT("f2", "p2", "v2"))),
  (Seq("28", "12", "2017", "09:30:44"), Seq(CT("f3", "p3", "v3")))
).toDF("tsArr", "contexts")

clickstreamDF.createOrReplaceTempView("clickstreamTable")

val df_header = spark.sql("""
  select
    concat(tsArr[1], "-", tsArr[0], "-", tsArr[2], " ", tsArr[3]) as time,
    cast(context._function as string) as funct,
    cast(context._param as string) as param,
    cast(context._value as string) as value
  from
    clickstreamTable lateral view explode(contexts) as context
""")

df_header.show
// +-------------------+-----+-----+-----+
// |               time|funct|param|value|
// +-------------------+-----+-----+-----+
// |11-27-2017 08:20:33|   f1|   p1|   v1|
// |11-27-2017 08:20:33|   f2|   p2|   v2|
// |12-28-2017 09:30:44|   f3|   p3|   v3|
// +-------------------+-----+-----+-----+

BTW,您可能需要考虑使用DATAFRAMAPI API,因为您已经在数据文件中有了数据:

val df_header = clickstreamDF.
  withColumn("time",
    concat($"tsArr"(1), lit("-"), $"tsArr"(0), lit("-"), $"tsArr"(2), lit(" "), $"tsArr"(3))
  ).
  withColumn("context", explode($"contexts")).
  select($"time",
    $"context._function".cast("String").as("funct"),
    $"context._param".cast("String").as("param"),
    $"context._value".cast("String").as("value")
  )

为了避免括起整个Spark SQL字符串的引号(即
)与SQL语句中使用的引号之间存在歧义,请对括起的引号使用三重引号(
)。您还需要删除包含那些
tsArr[]
s的
backticks
,如下例所示:

import org.apache.spark.sql.functions._
import spark.implicits._

case class CT(_function: String, _param: String, _value: String)

val clickstreamDF = Seq(
  (Seq("27", "11", "2017", "08:20:33"), Seq(CT("f1", "p1", "v1"), CT("f2", "p2", "v2"))),
  (Seq("28", "12", "2017", "09:30:44"), Seq(CT("f3", "p3", "v3")))
).toDF("tsArr", "contexts")

clickstreamDF.createOrReplaceTempView("clickstreamTable")

val df_header = spark.sql("""
  select
    concat(tsArr[1], "-", tsArr[0], "-", tsArr[2], " ", tsArr[3]) as time,
    cast(context._function as string) as funct,
    cast(context._param as string) as param,
    cast(context._value as string) as value
  from
    clickstreamTable lateral view explode(contexts) as context
""")

df_header.show
// +-------------------+-----+-----+-----+
// |               time|funct|param|value|
// +-------------------+-----+-----+-----+
// |11-27-2017 08:20:33|   f1|   p1|   v1|
// |11-27-2017 08:20:33|   f2|   p2|   v2|
// |12-28-2017 09:30:44|   f3|   p3|   v3|
// +-------------------+-----+-----+-----+

BTW,您可能需要考虑使用DATAFRAMAPI API,因为您已经在数据文件中有了数据:

val df_header = clickstreamDF.
  withColumn("time",
    concat($"tsArr"(1), lit("-"), $"tsArr"(0), lit("-"), $"tsArr"(2), lit(" "), $"tsArr"(3))
  ).
  withColumn("context", explode($"contexts")).
  select($"time",
    $"context._function".cast("String").as("funct"),
    $"context._param".cast("String").as("param"),
    $"context._value".cast("String").as("value")
  )

你好,利奥,谢谢你帮我。非常感谢您的支持!很有魅力。嗨,利奥,谢谢你帮我。非常感谢您的支持!工作起来很有魅力。