Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/json/15.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
带内部数组的json spark数据集_Json_Scala_Apache Spark_Apache Spark Sql_Apache Spark Dataset - Fatal编程技术网

带内部数组的json spark数据集

带内部数组的json spark数据集,json,scala,apache-spark,apache-spark-sql,apache-spark-dataset,Json,Scala,Apache Spark,Apache Spark Sql,Apache Spark Dataset,我正在尝试将json读入数据集(spark 2.1.1)。不幸的是,它不起作用。并在以下方面失败: Caused by: java.lang.NullPointerException: Null value appeared in non- nullable field: - field (class: "scala.Long", name: "age") 知道我做错了什么吗 case class Owner(id: String, pets: Seq[Pet]) case class Pet

我正在尝试将json读入数据集(spark 2.1.1)。不幸的是,它不起作用。并在以下方面失败:

Caused by: java.lang.NullPointerException: Null value appeared in non-
nullable field:
- field (class: "scala.Long", name: "age")
知道我做错了什么吗

case class Owner(id: String, pets: Seq[Pet])
case class Pet(name: String, age: Long)

val sampleJson = """{"id":"kotek", "pets":[{"name":"miauczek", 
"age":18}, {"name":"miauczek2", "age":9}]}"""

val session = SparkSession.builder().master("local").getOrCreate()
import session.implicits._

val rdd = session.sparkContext.parallelize(Seq(sampleJson))
val ds = session.read.json(rdd).as[Owner].collect()

通常,如果某些字段可能丢失,请使用
选项

case class Owner(id: String, pets: Seq[Pet])
case class Pet(name: String, age: Option[Long])
nullable
类型:

case class Owner(id: String, pets: Seq[Pet])
case class Pet(name: String, age: java.lang.Long)
但这一个看起来确实像一只虫子。我测试了这个ins Spark 2.2,现在已经解决了。我认为快速解决方法是按名称对字段进行排序:

case class Owner(id: String, pets: Seq[Pet])
case class Pet(age: java.lang.Long, name: String)

我相信这是火花中的一只虫子。如果我正确理解这里发生的事情。spark未按名称映射该内部类型(“pets”)。他用排序顺序来映射这些属性?因此pets.age被映射到Pet.name,当尝试映射pets.name->Pet.age时,他异常失败。任何人都可以确认我的理解是正确的,这就是spark bug?