Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/18.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
ApacheSpark(Scala):如何从JSON RDD中获取单个元素和子元素,并将其存储在新的RDD中?_Json_Scala_Apache Spark - Fatal编程技术网

ApacheSpark(Scala):如何从JSON RDD中获取单个元素和子元素,并将其存储在新的RDD中?

ApacheSpark(Scala):如何从JSON RDD中获取单个元素和子元素,并将其存储在新的RDD中?,json,scala,apache-spark,Json,Scala,Apache Spark,我正在从Amazon S3导入一些JSON数据,并将其存储在RDD中: val data_sep22 = spark.read.json("s3://firehose-json-events-stream/2019/09/22/*/*") 然后,我使用printSchema在数据结构处取一个峰值 scala> events_sep22.printSchema() root |-- data: struct (nullable = true) | |-- amount: stri

我正在从Amazon S3导入一些JSON数据,并将其存储在RDD中:

val data_sep22 = spark.read.json("s3://firehose-json-events-stream/2019/09/22/*/*")
然后,我使用printSchema在数据结构处取一个峰值

scala> events_sep22.printSchema()
root
 |-- data: struct (nullable = true)
 |    |-- amount: string (nullable = true)
 |    |-- createdAt: string (nullable = true)
 |    |-- percentage: string (nullable = true)
 |    |-- status: string (nullable = true)
 |-- id: string (nullable = true)
 |-- publishedAt: string (nullable = true)
如何创建一个只包含数据及其子元素的新RDD

使用选择

events_sep22.select("data").printSchema()

root
 |-- data: struct (nullable = true)
 |    |-- amount: string (nullable = true)
 |    |-- createdAt: string (nullable = true)
 |    |-- percentage: string (nullable = true)
 |    |-- status: string (nullable = true)

events_sep22.select("data.*").printSchema()

root
 |-- amount: string (nullable = true)
 |-- createdAt: string (nullable = true)
 |-- percentage: string (nullable = true)
 |-- status: string (nullable = true)