Scala Spark读取json

Scala Spark读取json,json,scala,apache-spark,Json,Scala,Apache Spark,我的代码如下所示 val sparkConf = new SparkConf().setAppName("Json Test").setMaster("local[*]") val sc = new SparkContext(sparkConf) val sqlContext = new org.apache.spark.sql.SQLContext(sc) import sqlContext.implicits._ val path = "/path/log.json" val df

我的代码如下所示

val sparkConf = new SparkConf().setAppName("Json Test").setMaster("local[*]") 
val sc = new SparkContext(sparkConf) 
val sqlContext = new org.apache.spark.sql.SQLContext(sc) 
import sqlContext.implicits._

val path = "/path/log.json" 
val df = sqlContext.read.json(path)
df.show()
    +---+--------------------+----+-------------+
|COL|                DATA|IFAM|          KTM|
+---+--------------------+----+-------------+
| 21|[[2,30,0,null], [...| EQR|1430006400000|
+---+--------------------+----+-------------+
示例json数据

    +---+--------------------+----+-------------+
|COL|                DATA|IFAM|          KTM|
+---+--------------------+----+-------------+
| 21|[[2,30,0,null], [...| EQR|1430006400000|
+---+--------------------+----+-------------+
{“IFAM”:“EQR”,“KTM”:14306400000,“COL”:21,“数据”:[{“MLrate”:“30”,“Nrout”:“0”,“up”:null,“板条箱”:“2”},{“MLrate”:“0”,“up”:null,“板条箱”:“2”;“5”,“up”:null,“板条箱”:“2”;“2”;“MLrate”:“34”,“Nrout”:“0”,“up”:null,“板条箱”:“4”;“MLrate”:“33”,“Nrout”:“0”,“up”:null,“板条箱”:“0”,“Nrout”:“2”,“Nrout”;“Nrout”:“0”,“Nrout”:“2”,“Nrout”;“Nrout”:“8”::“2”}]}

    +---+--------------------+----+-------------+
|COL|                DATA|IFAM|          KTM|
+---+--------------------+----+-------------+
| 21|[[2,30,0,null], [...| EQR|1430006400000|
+---+--------------------+----+-------------+
在scala ide发生错误时,我无法理解这一点:

    +---+--------------------+----+-------------+
|COL|                DATA|IFAM|          KTM|
+---+--------------------+----+-------------+
| 21|[[2,30,0,null], [...| EQR|1430006400000|
+---+--------------------+----+-------------+
信息共享状态:仓库路径为 '文件:/C:/Users/ben53/workspace/Demo/spark warehouse/'。例外情况 线程“main”java.util.ServiceConfigurationError: org.apache.spark.sql.sources.DataSourceRegister:提供程序 无法实例化org.apache.spark.sql.hive.orc.DefaultSource 位于java.util.ServiceLoader.fail(未知源) java.util.ServiceLoader.access$100(未知源代码)位于 位于的java.util.ServiceLoader$LazyIterator.nextService(未知源) java.util.ServiceLoader$LazyIterator.next(未知源代码)位于 java.util.ServiceLoader$1.next(未知源代码)位于 scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43) 位于scala.collection.Iterator$class.foreach(Iterator.scala:893) scala.collection.AbstractIterator.foreach(迭代器.scala:1336)位于 scala.collection.IterableLike$class.foreach(IterableLike.scala:72)位于 scala.collection.AbstractIterable.foreach(Iterable.scala:54)位于 scala.collection.TraversableLike$class.filterImpl(TraversableLike.scala:247) 在 scala.collection.TraversableLike$class.filter(TraversableLike.scala:259) 位于scala.collection.AbstractTraversable.filter(Traversable.scala:104) 在 org.apache.spark.sql.execution.datasources.DataSource$.lookUpdateSource(DataSource.scala:575) 在 org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:86) 在 org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:86) 在 org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:325) 在 org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152) 在 org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:298) 在 org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:251) 在com.dataflair.spark.QueryLog$.main(QueryLog.scala:27)上 com.dataflair.spark.QueryLog.main(QueryLog.scala)由以下原因引起: java.lang.VerifyError:错误的返回类型异常详细信息:位置: org/apache/spark/sql/hive/orc/DefaultSource.createRelation(Lorg/apache/spark/sql/SQLContext;[Ljava/lang/String;Lscala/Option;Lscala/Option;Lscala/collection/immutable/Map;)Lorg/apache/spark/sql/sources/HadoopFsRelation; @35:原因: 类型“org/apache/spark/sql/hive/orc/orcrational”(当前帧,堆栈[0])不可分配给 “org/apache/spark/sql/sources/HadoopFsRelation”(来自方法 签名)当前帧: 密件抄送:@35 标志:{} 局部变量:{'org/apache/spark/sql/hive/orc/DefaultSource','org/apache/spark/sql/SQLContext','[Ljava/lang/String;', 'scala/Option','scala/Option','scala/collection/immutable/Map'} 堆栈:{'org/apache/spark/sql/hive/orc/orcrational'}字节码: 0x0000000:b200 1c2b c100 1ebb 000e 592a b700 22b6 0x0000010:0026 bb00 2859 2c2d b200 2d19 0419 052b 0x0000020:b700 30b0

    +---+--------------------+----+-------------+
|COL|                DATA|IFAM|          KTM|
+---+--------------------+----+-------------+
| 21|[[2,30,0,null], [...| EQR|1430006400000|
+---+--------------------+----+-------------+
位于java.lang.Class.getDeclaredConstructors0(本机方法) 位于的java.lang.Class.privateGetDeclaredConstructors(未知源) 位于的java.lang.Class.getConstructor0(未知源代码) java.lang.Class.newInstance(未知源)…20多个

    +---+--------------------+----+-------------+
|COL|                DATA|IFAM|          KTM|
+---+--------------------+----+-------------+
| 21|[[2,30,0,null], [...| EQR|1430006400000|
+---+--------------------+----+-------------+


我非常确定您的路径不正确。请检查文件是否位于指定的路径。Json有效。

路径应该正确。但提供的Json无效。请更正示例Json,然后重试。
    +---+--------------------+----+-------------+
|COL|                DATA|IFAM|          KTM|
+---+--------------------+----+-------------+
| 21|[[2,30,0,null], [...| EQR|1430006400000|
+---+--------------------+----+-------------+
您可以在上验证JSON

    +---+--------------------+----+-------------+
|COL|                DATA|IFAM|          KTM|
+---+--------------------+----+-------------+
| 21|[[2,30,0,null], [...| EQR|1430006400000|
+---+--------------------+----+-------------+
它显示JSON的无效部分

    +---+--------------------+----+-------------+
|COL|                DATA|IFAM|          KTM|
+---+--------------------+----+-------------+
| 21|[[2,30,0,null], [...| EQR|1430006400000|
+---+--------------------+----+-------------+
虽然我尝试了这个示例,但得到的结果如下:

    +---+--------------------+----+-------------+
|COL|                DATA|IFAM|          KTM|
+---+--------------------+----+-------------+
| 21|[[2,30,0,null], [...| EQR|1430006400000|
+---+--------------------+----+-------------+
使用的代码如下:

    +---+--------------------+----+-------------+
|COL|                DATA|IFAM|          KTM|
+---+--------------------+----+-------------+
| 21|[[2,30,0,null], [...| EQR|1430006400000|
+---+--------------------+----+-------------+
object Test {

  def main(args: Array[String]) {
    val sparkConf = new SparkConf().setAppName("Json Test").setMaster("local[*]")
    val sc = new SparkContext(sparkConf)
    val sqlContext = new org.apache.spark.sql.SQLContext(sc)
    import sqlContext.implicits._

    val path = "/home/test/Desktop/test.json"
    val df = sqlContext.read.json(path)
    df.show()
  }
}

你的json有效吗?你能分享你的pom文件或sbt文件和json文件的样本吗?是的,这是一个有效的json,我使用eclipse scala ide编译和运行scala版本2.1json是正确的。我已经测试了它,并用上面的代码和json形成了数据框架。:)我更新的json是正确的,我认为管理员在json部分编辑错误,错误或者似乎不是关于json格式,而是关于spark设置。答案很有用,因为我的json有问题,我没有意识到!数据集填充为null,但现在已正确填充。