Spark:在JSON数组字段中处理空值

Spark:在JSON数组字段中处理空值,json,apache-spark,apache-spark-sql,Json,Apache Spark,Apache Spark Sql,我正在加载下面的JSON数据。假设这是来自Kafka的反序列化字符串 {"message":{"title": {"titleid": "111", "titlename": "AAA", "titledesc": null}, "customer": {"customerDetail": {"customerid": 1107879, "rates": [{"type": "Commission", "amount": 0.0, "currency": null}, {"type": "Tot

我正在加载下面的JSON数据。假设这是来自Kafka的反序列化字符串

{"message":{"title": {"titleid": "111", "titlename": "AAA", "titledesc": null}, "customer": {"customerDetail": {"customerid": 1107879, "rates": [{"type": "Commission", "amount": 0.0, "currency": null}, {"type": "Total CV", "amount": 0.0, "currency": null}]}}}}
{"message":{"title": {"titleid": "222", "titlename": "BBB", "titledesc": null}, "customer": {"customerDetail": {"customerid": 1107875, "rates": [{"type": "Commission", "amount": 0.0, "currency": null}, {"type": "Total CV", "amount": 0.0, "currency": null}]}}}}
{"message":{"title": {"titleid": "333", "titlename": "CCC", "titledesc": null}, "customer": {"customerDetail": {"customerid": 1107882, "rates": [{"type": "Commission", "amount": 0.0, "currency": null}, {"type": "Total CV", "amount": 0.0, "currency": null}]}}}}
{"message":{"title": {"titleid": "444", "titlename": "DDD", "titledesc": null}, "customer": {"customerDetail": {"customerid": 1107880, "rates": [{"type": "Commission", "amount": 0.0, "currency": null}, {"type": "Total CV", "amount": 0.0, "currency": null}]}}}}
{"message":{"title": {"titleid": "555", "titlename": "EEE", "titledesc": null}, "customer": {"customerDetail": {"customerid": 1107884, "rates": [{"type": "Commission", "amount": 0.0, "currency": null}, {"type": "Total CV", "amount": 0.0, "currency": null}]}}}}


val ds = spark.read.textFile("./src/main/resources/json/JsonWithNull.txt").as[String]
ds.printSchema()
ds.show(false)

root
 |-- value: string (nullable = true)

+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|value                                                                                                                                                                                                                                                               |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|{"message":{"title": {"titleid": "111", "titlename": "AAA", "titledesc": null}, "customer": {"customerDetail": {"customerid": 1107879, "rates": [{"type": "Commission", "amount": 0.0, "currency": null}, {"type": "Total CV", "amount": 0.0, "currency": null}]}}}}|
|{"message":{"title": {"titleid": "222", "titlename": "BBB", "titledesc": null}, "customer": {"customerDetail": {"customerid": 1107875, "rates": [{"type": "Commission", "amount": 0.0, "currency": null}, {"type": "Total CV", "amount": 0.0, "currency": null}]}}}}|
|{"message":{"title": {"titleid": "333", "titlename": "CCC", "titledesc": null}, "customer": {"customerDetail": {"customerid": 1107882, "rates": [{"type": "Commission", "amount": 0.0, "currency": null}, {"type": "Total CV", "amount": 0.0, "currency": null}]}}}}|
|{"message":{"title": {"titleid": "444", "titlename": "DDD", "titledesc": null}, "customer": {"customerDetail": {"customerid": 1107880, "rates": [{"type": "Commission", "amount": 0.0, "currency": null}, {"type": "Total CV", "amount": 0.0, "currency": null}]}}}}|
|{"message":{"title": {"titleid": "555", "titlename": "EEE", "titledesc": null}, "customer": {"customerDetail": {"customerid": 1107884, "rates": [{"type": "Commission", "amount": 0.0, "currency": null}, {"type": "Total CV", "amount": 0.0, "currency": null}]}}}}|
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
然后将数据集[String]作为JSON加载,我看到
模式中的所有列,包括
curreny
字段中的
rates
字段

val jsonDF = spark.read.json(ds)
jsonDF.printSchema()
jsonDF.show(false)

root
 |-- message: struct (nullable = true)
 |    |-- customer: struct (nullable = true)
 |    |    |-- customerDetail: struct (nullable = true)
 |    |    |    |-- customerid: long (nullable = true)
 |    |    |    |-- rates: array (nullable = true)
 |    |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |    |-- amount: double (nullable = true)
 |    |    |    |    |    |-- currency: string (nullable = true)
 |    |    |    |    |    |-- type: string (nullable = true)
 |    |-- title: struct (nullable = true)
 |    |    |-- titledesc: string (nullable = true)
 |    |    |-- titleid: string (nullable = true)
 |    |    |-- titlename: string (nullable = true)

+-------------------------------------------------------------------+
|message                                                            |
+-------------------------------------------------------------------+
|[[[1107879, [[0.0,, Commission], [0.0,, Total CV]]]], [, 111, AAA]]|
|[[[1107875, [[0.0,, Commission], [0.0,, Total CV]]]], [, 222, BBB]]|
|[[[1107882, [[0.0,, Commission], [0.0,, Total CV]]]], [, 333, CCC]]|
|[[[1107880, [[0.0,, Commission], [0.0,, Total CV]]]], [, 444, DDD]]|
|[[[1107884, [[0.0,, Commission], [0.0,, Total CV]]]], [, 555, EEE]]|
+-------------------------------------------------------------------+
但是,当我使用
to_JSON函数
rates数组列
转换为JSON时,它完全忽略了
currency
字段,这可能是因为它的
null

jsonDF.select(to_json(struct($"message.customer.customerDetail.rates")).as("Rates")).show(false)
输出:

+-------------------------------------------------------------------------------+
|Rates                                                                          |
+-------------------------------------------------------------------------------+
|{"rates":[{"amount":0.0,"type":"Commission"},{"amount":0.0,"type":"Total CV"}]}|
|{"rates":[{"amount":0.0,"type":"Commission"},{"amount":0.0,"type":"Total CV"}]}|
|{"rates":[{"amount":0.0,"type":"Commission"},{"amount":0.0,"type":"Total CV"}]}|
|{"rates":[{"amount":0.0,"type":"Commission"},{"amount":0.0,"type":"Total CV"}]}|
|{"rates":[{"amount":0.0,"type":"Commission"},{"amount":0.0,"type":"Total CV"}]}|
+-------------------------------------------------------------------------------+
我如何解决这个问题