ApacheSpark(Scala):如何从JSON RDD中获取单个元素和子元素,并将其存储在新的RDD中?
我正在从Amazon S3导入一些JSON数据,并将其存储在RDD中:ApacheSpark(Scala):如何从JSON RDD中获取单个元素和子元素,并将其存储在新的RDD中?,json,scala,apache-spark,Json,Scala,Apache Spark,我正在从Amazon S3导入一些JSON数据,并将其存储在RDD中: val data_sep22 = spark.read.json("s3://firehose-json-events-stream/2019/09/22/*/*") 然后,我使用printSchema在数据结构处取一个峰值 scala> events_sep22.printSchema() root |-- data: struct (nullable = true) | |-- amount: stri
val data_sep22 = spark.read.json("s3://firehose-json-events-stream/2019/09/22/*/*")
然后,我使用printSchema在数据结构处取一个峰值
scala> events_sep22.printSchema()
root
|-- data: struct (nullable = true)
| |-- amount: string (nullable = true)
| |-- createdAt: string (nullable = true)
| |-- percentage: string (nullable = true)
| |-- status: string (nullable = true)
|-- id: string (nullable = true)
|-- publishedAt: string (nullable = true)
如何创建一个只包含数据及其子元素的新RDD 使用选择
events_sep22.select("data").printSchema()
root
|-- data: struct (nullable = true)
| |-- amount: string (nullable = true)
| |-- createdAt: string (nullable = true)
| |-- percentage: string (nullable = true)
| |-- status: string (nullable = true)
events_sep22.select("data.*").printSchema()
root
|-- amount: string (nullable = true)
|-- createdAt: string (nullable = true)
|-- percentage: string (nullable = true)
|-- status: string (nullable = true)