很难将JSON转换为Spark数据帧
我一直在尝试将JSON加载到pyspark数据帧中,但我在这里遇到了一些困难 这是我迄今为止尝试过的(有多行和无多行): JSON文件:很难将JSON转换为Spark数据帧,json,apache-spark,pyspark,apache-spark-sql,Json,Apache Spark,Pyspark,Apache Spark Sql,我一直在尝试将JSON加载到pyspark数据帧中,但我在这里遇到了一些困难 这是我迄今为止尝试过的(有多行和无多行): JSON文件: testjson = [ ('{"id":434, "address" : ["432.432.432.432", "432.432.432.432", "432.432.432.432", "432.432.432.432"]}',),
testjson = [
('{"id":434, "address" : ["432.432.432.432", "432.432.432.432", "432.432.432.432", "432.432.432.432"]}',),
('{"id":434, "address" : ["432.432.432.432", "432.432.432.432", "432.432.432.432", "432.432.432.432"]}',),
('{"id":434, "address" : ["432.432.432.432", "432.432.432.432", "432.432.432.432", "432.432.432.432"]}',),
('{"id":434, "address" : ["432.432.432.432", "432.432.432.432", "432.432.432.432", "432.432.432.432"]}',),
('{"id":434, "address" : ["432.432.432.432", "432.432.432.432", "432.432.432.432", "432.432.432.432"]}',),
('{"id":434, "address" : ["432.432.432.432", "432.432.432.432", "432.432.432.432", "432.432.432.432"]}',),
]
当试图显示数据帧时,我得到“corrupt_record”。我做错了什么?尝试将其转换为字符串列表。Spark无法理解字符串元组列表。另外,
json.dumps
是不必要的,因为Spark应该能够理解您的json输入
df = spark.read.json(sc.parallelize([i[0] for i in testjson]))
df.show(truncate=False)
+--------------------------------------------------------------------+---+
|address |id |
+--------------------------------------------------------------------+---+
|[432.432.432.432, 432.432.432.432, 432.432.432.432, 432.432.432.432]|434|
|[432.432.432.432, 432.432.432.432, 432.432.432.432, 432.432.432.432]|434|
|[432.432.432.432, 432.432.432.432, 432.432.432.432, 432.432.432.432]|434|
|[432.432.432.432, 432.432.432.432, 432.432.432.432, 432.432.432.432]|434|
|[432.432.432.432, 432.432.432.432, 432.432.432.432, 432.432.432.432]|434|
|[432.432.432.432, 432.432.432.432, 432.432.432.432, 432.432.432.432]|434|
+--------------------------------------------------------------------+---+
JSON似乎无效。尝试通过验证器运行它。
df = spark.read.json(sc.parallelize([i[0] for i in testjson]))
df.show(truncate=False)
+--------------------------------------------------------------------+---+
|address |id |
+--------------------------------------------------------------------+---+
|[432.432.432.432, 432.432.432.432, 432.432.432.432, 432.432.432.432]|434|
|[432.432.432.432, 432.432.432.432, 432.432.432.432, 432.432.432.432]|434|
|[432.432.432.432, 432.432.432.432, 432.432.432.432, 432.432.432.432]|434|
|[432.432.432.432, 432.432.432.432, 432.432.432.432, 432.432.432.432]|434|
|[432.432.432.432, 432.432.432.432, 432.432.432.432, 432.432.432.432]|434|
|[432.432.432.432, 432.432.432.432, 432.432.432.432, 432.432.432.432]|434|
+--------------------------------------------------------------------+---+