Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/flutter/10.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 在Spark中解析包含保留字符的JSON_Apache Spark - Fatal编程技术网

Apache spark 在Spark中解析包含保留字符的JSON

Apache spark 在Spark中解析包含保留字符的JSON,apache-spark,Apache Spark,我有一个JSON input.txt文件,数据如下: 2018-05-30.txt:{"Message":{"eUuid":"6e7d4890-9279-491a-ae4d-70416ef9d42d","schemaVersion":"1.0-AB1","timestamp":1527539376,"id":"XYZ","location":{"dim":{"x":2,"y":-7},"towards":121.0},"source":"a","UniqueId":"test123","code

我有一个JSON input.txt文件,数据如下:

2018-05-30.txt:{"Message":{"eUuid":"6e7d4890-9279-491a-ae4d-70416ef9d42d","schemaVersion":"1.0-AB1","timestamp":1527539376,"id":"XYZ","location":{"dim":{"x":2,"y":-7},"towards":121.0},"source":"a","UniqueId":"test123","code":"del","signature":"xyz","":{},"vel":{"ground":15},"height":{},"next":{"dim":{}},"sub":"del1"}}
2018-05-30.txt:{"Message":{"eUuid":"5e7d4890-9279-491a-ae4d-70416ef9d42d","schemaVersion":"1.0-AB1","timestamp":1627539376,"id":"ABC","location":{"dim":{"x":1,"y":-8},"towards":132.0},"source":"b","UniqueId":"hello123","code":"fra","signature":"abc","":{},"vel":{"ground":16},"height":{},"next":{"dim":{}},"sub":"fra1"}}
.
.
>>val df = spark.read.json("<full path of input.txt file>")
我尝试将JSON加载到数据帧中,如下所示:

2018-05-30.txt:{"Message":{"eUuid":"6e7d4890-9279-491a-ae4d-70416ef9d42d","schemaVersion":"1.0-AB1","timestamp":1527539376,"id":"XYZ","location":{"dim":{"x":2,"y":-7},"towards":121.0},"source":"a","UniqueId":"test123","code":"del","signature":"xyz","":{},"vel":{"ground":15},"height":{},"next":{"dim":{}},"sub":"del1"}}
2018-05-30.txt:{"Message":{"eUuid":"5e7d4890-9279-491a-ae4d-70416ef9d42d","schemaVersion":"1.0-AB1","timestamp":1627539376,"id":"ABC","location":{"dim":{"x":1,"y":-8},"towards":132.0},"source":"b","UniqueId":"hello123","code":"fra","signature":"abc","":{},"vel":{"ground":16},"height":{},"next":{"dim":{}},"sub":"fra1"}}
.
.
>>val df = spark.read.json("<full path of input.txt file>")
val df=spark.read.json(“”) 我收到

_腐败记录

数据帧


我知道json字符包含“.”(2018-05-30.txt)作为导致问题的保留字符。如何解决此问题?

问题不是保留字符,而是文件不包含有效的JSON 所以你可以

val df=spark.read.textFile(...)
val json=spark.read.json(df.map(v=>v.drop(15)))

json.printSchema()
root
 |-- Message: struct (nullable = true)
 |    |-- UniqueId: string (nullable = true)
 |    |-- code: string (nullable = true)
 |    |-- eUuid: string (nullable = true)
 |    |-- id: string (nullable = true)
 |    |-- location: struct (nullable = true)
 |    |    |-- dim: struct (nullable = true)
 |    |    |    |-- x: long (nullable = true)
 |    |    |    |-- y: long (nullable = true)
 |    |    |-- towards: double (nullable = true)
 |    |-- schemaVersion: string (nullable = true)
 |    |-- signature: string (nullable = true)
 |    |-- source: string (nullable = true)
 |    |-- sub: string (nullable = true)
 |    |-- timestamp: long (nullable = true)
 |    |-- vel: struct (nullable = true)
 |    |    |-- ground: long (nullable = true)

读取为RDD,删除引起问题的前几个字符,您将得到一个JSON RDD(仍然是
RDD[String]
但带有有效的n JSON),将其传递给
spark.Read.JSON
。。。利润