Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/18.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala/Spark:使用Scala Spark展平RDD中的多个json,但获取无效数据_Scala_Apache Spark - Fatal编程技术网

Scala/Spark:使用Scala Spark展平RDD中的多个json,但获取无效数据

Scala/Spark:使用Scala Spark展平RDD中的多个json,但获取无效数据,scala,apache-spark,Scala,Apache Spark,我的代码首选scala中的Flatte multiple json **val data = sc.textFile("/user/cloudera/spark/sample.json") val nospace = data.map(x => x.trim()) val nospaces = nospace.filter(x => x!="") val local = nospaces.collect var vline ="&qu

我的代码首选scala中的Flatte multiple json

**val data = sc.textFile("/user/cloudera/spark/sample.json")
val nospace = data.map(x => x.trim())
val nospaces = nospace.filter(x => x!="")
val local = nospaces.collect
var vline =""
var eline :List[String]= List()
var lcnt =0
var rcnt =0
local.map{x =>
vline+=x
if (x=="[") lcnt+=1
if (x=="[") rcnt+=1
if (lcnt==rcnt){
eline++=List(vline)
lcnt=0
rcnt=0
vline =""
}
}**
我的输入表多个json文件:

 [
    {
    “Year”: “2013”,
    “First Name”: “JANE”,
    “County”: “A”,
    “Sex”: “F”,
    “Count”: “27”
    },{
    “Year”: “2013”,
    “First Name”: “JADE”,
    “County”: “B”,
    “Sex”: “M”,
    “Count”: “26”
    },{
    “Year”: “2013”,
    “First Name”: “JAMES”,
    “County”: “C”,
    “Sex”: “M”,
    “Count”: “21”
    }
    ]
获取的输入json

root@ubuntu:/home/sathya/Desktop/stackoverflo/data# cat /home/sathya/Desktop/stackoverflo/data/sample.json 

[
    {
    "Year": "2013",
    "First Name": "JANE",
    "County": "A",
    "Sex": "F",
    "Count": "27"
    },{
    "Year": "2013",
    "First Name": "JADE",
    "County": "B",
    "Sex": "M",
    "Count": "26"
    },{
    "Year": "2013",
    "First Name": "JAMES",
    "County": "C",
    "Sex": "M",
    "Count": "21"
    }
    ]
读取json并将其展平为数据帧列的代码

spark.read.option("multiline","true").json("file:////home/sathya/Desktop/stackoverflo/data/sample.json").show()

'''
+-----+------+----------+---+----+                                              
|Count|County|First Name|Sex|Year|
+-----+------+----------+---+----+
|   27|     A|      JANE|  F|2013|
|   26|     B|      JADE|  M|2013|
|   21|     C|     JAMES|  M|2013|
+-----+------+----------+---+----+
'''


你能处理RDD吗?当DF存在时,返回RDD有什么用?请使用DF,因为它效率高、速度快!。如果这解决了您的问题,请接受答案。请在发布之前考虑格式化您的代码。阅读越容易,你就越有可能得到答案。另外,请确认**确实是代码的一部分。