Scala/Spark:使用Scala Spark展平RDD中的多个json,但获取无效数据
我的代码首选scala中的Flatte multiple jsonScala/Spark:使用Scala Spark展平RDD中的多个json,但获取无效数据,scala,apache-spark,Scala,Apache Spark,我的代码首选scala中的Flatte multiple json **val data = sc.textFile("/user/cloudera/spark/sample.json") val nospace = data.map(x => x.trim()) val nospaces = nospace.filter(x => x!="") val local = nospaces.collect var vline ="&qu
**val data = sc.textFile("/user/cloudera/spark/sample.json")
val nospace = data.map(x => x.trim())
val nospaces = nospace.filter(x => x!="")
val local = nospaces.collect
var vline =""
var eline :List[String]= List()
var lcnt =0
var rcnt =0
local.map{x =>
vline+=x
if (x=="[") lcnt+=1
if (x=="[") rcnt+=1
if (lcnt==rcnt){
eline++=List(vline)
lcnt=0
rcnt=0
vline =""
}
}**
我的输入表多个json文件:
[
{
“Year”: “2013”,
“First Name”: “JANE”,
“County”: “A”,
“Sex”: “F”,
“Count”: “27”
},{
“Year”: “2013”,
“First Name”: “JADE”,
“County”: “B”,
“Sex”: “M”,
“Count”: “26”
},{
“Year”: “2013”,
“First Name”: “JAMES”,
“County”: “C”,
“Sex”: “M”,
“Count”: “21”
}
]
获取的输入json
root@ubuntu:/home/sathya/Desktop/stackoverflo/data# cat /home/sathya/Desktop/stackoverflo/data/sample.json
[
{
"Year": "2013",
"First Name": "JANE",
"County": "A",
"Sex": "F",
"Count": "27"
},{
"Year": "2013",
"First Name": "JADE",
"County": "B",
"Sex": "M",
"Count": "26"
},{
"Year": "2013",
"First Name": "JAMES",
"County": "C",
"Sex": "M",
"Count": "21"
}
]
读取json并将其展平为数据帧列的代码
spark.read.option("multiline","true").json("file:////home/sathya/Desktop/stackoverflo/data/sample.json").show()
'''
+-----+------+----------+---+----+
|Count|County|First Name|Sex|Year|
+-----+------+----------+---+----+
| 27| A| JANE| F|2013|
| 26| B| JADE| M|2013|
| 21| C| JAMES| M|2013|
+-----+------+----------+---+----+
'''
你能处理RDD吗?当DF存在时,返回RDD有什么用?请使用DF,因为它效率高、速度快!。如果这解决了您的问题,请接受答案。请在发布之前考虑格式化您的代码。阅读越容易,你就越有可能得到答案。另外,请确认**确实是代码的一部分。