Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/371.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在spark java中将JSON对象转换为单独的列_Java_Scala_Apache Spark - Fatal编程技术网

在spark java中将JSON对象转换为单独的列

在spark java中将JSON对象转换为单独的列,java,scala,apache-spark,Java,Scala,Apache Spark,我有一个Spark数据集,希望将其转换为单独的列 使用Spark 2.2和java 1.8 DF.printSchema() root |-- ute.internal.id: string (nullable = false) |-- ute.features.serialized: string (nullable = false) DF.show() {"ute.id":"123","ute.isBoolean":"true","ute.sortPriority":"5"}, {"

我有一个Spark数据集,希望将其转换为单独的列

使用Spark 2.2和java 1.8

DF.printSchema()
root
 |-- ute.internal.id: string (nullable = false)
 |-- ute.features.serialized: string (nullable = false)

DF.show()

{"ute.id":"123","ute.isBoolean":"true","ute.sortPriority":"5"},
{"ute.id":"456","ute.isBoolean":"false","ute.sortPriority":"6"}

Expected output - 
===============
ute.id|ute.feature.isBoolean|ute.sortPriority
123   |true                 |5 
456   |false                |6 
有人能帮忙吗?谢谢

val newDf = sqlContext.read.json(df.rdd)
它将为您提供一个包含所有json列的数据框架

范例

  val json2 ="""{"ute.id":"123","ute.isBoolean":"true","ute.sortPriority":"5"},
              |{"ute.id":"456","ute.isBoolean":"false","ute.sortPriority":"6"}"""
  val jsonRdd = sc.parallelize(Seq(json2))
  val sqlContext = new SQLContext(sc)


  val df = sqlContext.read.json(jsonRdd)
  df.show(false)

+------+-------------+----------------+
|ute.id|ute.isBoolean|ute.sortPriority|
+------+-------------+----------------+
|123   |true         |5               |
+------+-------------+----------------+

您提供的架构与
show
的输出不匹配。另一列
ute.features.serialized
不可见。请提供一个.ute.features.serialized是dataframe列之一。所以我认为我们不能使用read.json方法。