Scala 如何在结构化流中解析JSON记录?
我正在开发一个spark结构化流媒体应用程序,并尝试解析以下格式给出的JSONScala 如何在结构化流中解析JSON记录?,scala,apache-spark,apache-spark-sql,spark-structured-streaming,Scala,Apache Spark,Apache Spark Sql,Spark Structured Streaming,我正在开发一个spark结构化流媒体应用程序,并尝试解析以下格式给出的JSON {"name":"xyz","age":29,"details":["city":"mumbai","country":"India"]} {"name":"abc","age":25,"details":["city":"mumbai","country":"India"]} 下面是我解析JSON的Spark代码: import org.apache.spark.sql.types._ import spark.
{"name":"xyz","age":29,"details":["city":"mumbai","country":"India"]}
{"name":"abc","age":25,"details":["city":"mumbai","country":"India"]}
下面是我解析JSON的Spark代码:
import org.apache.spark.sql.types._
import spark.implicits._
val schema= new StructType()
.add("name",DataTypes.StringType )
.add("age", DataTypes.IntegerType)
.add("details",
new StructType()
.add("city", DataTypes.StringType)
.add("country", DataTypes.StringType)
)
val dfLogLines = dfRawData.selectExpr("CAST(value AS STRING)") //Converting binary to text
val personNestedDf = dfLogLines.select(from_json($"value", schema).as("person"))
val personFlattenedDf = personNestedDf.selectExpr("person.name", "person.age")
personFlattenedDf.printSchema()
personFlattenedDf.writeStream.format("console").option("checkpointLocation",checkpoint_loc3).start().awaitTermination()
输出:
root
|-- name: string (nullable = true)
|-- age: integer (nullable = true)
-------------------------------------------
Batch: 0
-------------------------------------------
+----+----+
|name| age|
+----+----+
|null|null|
|null|null|
+----+----+
代码不会抛出任何错误,但在输出中返回空值。我做错了什么?
提前感谢。tl;drJSON在
详细信息字段中的格式不正确
从标准功能的文档中:
如果是不可解析的字符串,则返回null
问题在于详细信息
字段
{“详情”:[“城市”:“孟买”,“国家”:“印度”]}
它看起来像数组或映射,但没有匹配项
scala> Seq(Array("one", "two")).toDF("value").toJSON.show(truncate = false)
+-----------------------+
|value |
+-----------------------+
|{"value":["one","two"]}|
+-----------------------+
scala> Seq(Map("one" -> "two")).toDF("value").toJSON.show(truncate = false)
+-----------------------+
|value |
+-----------------------+
|{"value":{"one":"two"}}|
+-----------------------+
scala> Seq(("mumbai", "India")).toDF("city", "country").select(struct("city", "country") as "details").toJSON.show(truncate = false)
+-----------------------------------------------+
|value |
+-----------------------------------------------+
|{"details":{"city":"mumbai","country":"India"}}|
+-----------------------------------------------+
我的建议是自己使用用户定义函数(UDF)进行JSON解析。谢谢!JSON格式不正确。