Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/ruby-on-rails-4/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
scala-将每个json行转换为表_Scala_Apache Spark_Apache Spark Sql - Fatal编程技术网

scala-将每个json行转换为表

scala-将每个json行转换为表,scala,apache-spark,apache-spark-sql,Scala,Apache Spark,Apache Spark Sql,下面是我的数据文件的示例行: {"externalUserId":"f850bgv8-c638-4ab2-a68a d79375fa2091","externalUserPw":null,"ipaddr":null,"eventId":0,"userId":1713703316,"applicationId":489167,"eventType":201,"eventData":"{\"apps\":[\"com.happyadda.jalebi\"],\"appType\":2}","devi

下面是我的数据文件的示例行:

{"externalUserId":"f850bgv8-c638-4ab2-a68a d79375fa2091","externalUserPw":null,"ipaddr":null,"eventId":0,"userId":1713703316,"applicationId":489167,"eventType":201,"eventData":"{\"apps\":[\"com.happyadda.jalebi\"],\"appType\":2}","device":null,"version":"3.0.0-b1","bundleId":null,"appPlatform":null,"eventDate":"2017-01-22T13:46:30+05:30"}`
我有数百万这样的行,如果整个文件都是单个json,我可以使用json阅读器,但我如何在单个文件中处理多个json行并将它们转换为表呢

如何将此数据转换为包含列的sql表:

 |externalUserId |externalUserPw|ipaddr| eventId  |userId    |.......
 |---------------|--------------|------|----------|----------|.......
 |f850bgv8-..... |null          |null  |0         |1713703316|.......

您可以使用spark内置的
read.json
功能。当每一行包含一个JSON时,这对您的情况来说似乎很好

例如,以下内容基于JSON文件的内容创建数据帧:

val df = spark.read.json("examples/src/main/resources/people.json")

// Displays the content of the DataFrame to stdout
df.show()
更多信息:

Spark SQL可以自动推断JSON数据集的模式,并将其作为dataset[Row]加载。可以使用字符串的RDD或json文件上的
SparkSession.read.json()
来完成此转换


请注意,作为json文件提供的文件不是典型的json文件每行必须包含一个单独的、自包含的有效JSON对象。有关更多信息,请参阅JSON行文本格式,也称为换行分隔JSON。因此,常规的多行JSON文件通常会失败。

到目前为止您尝试了什么?哪个在工作还是不工作?这是可能的。。我需要更多的数据,比如文件中两个json行之间的分隔符是什么?文件的格式是什么?每行用换行符分隔
\n
,文件格式是txt文件