Scala Spark读取json
我的代码如下所示Scala Spark读取json,json,scala,apache-spark,Json,Scala,Apache Spark,我的代码如下所示 val sparkConf = new SparkConf().setAppName("Json Test").setMaster("local[*]") val sc = new SparkContext(sparkConf) val sqlContext = new org.apache.spark.sql.SQLContext(sc) import sqlContext.implicits._ val path = "/path/log.json" val df
val sparkConf = new SparkConf().setAppName("Json Test").setMaster("local[*]")
val sc = new SparkContext(sparkConf)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
val path = "/path/log.json"
val df = sqlContext.read.json(path)
df.show()
+---+--------------------+----+-------------+
|COL| DATA|IFAM| KTM|
+---+--------------------+----+-------------+
| 21|[[2,30,0,null], [...| EQR|1430006400000|
+---+--------------------+----+-------------+
示例json数据
+---+--------------------+----+-------------+
|COL| DATA|IFAM| KTM|
+---+--------------------+----+-------------+
| 21|[[2,30,0,null], [...| EQR|1430006400000|
+---+--------------------+----+-------------+
{“IFAM”:“EQR”,“KTM”:14306400000,“COL”:21,“数据”:[{“MLrate”:“30”,“Nrout”:“0”,“up”:null,“板条箱”:“2”},{“MLrate”:“0”,“up”:null,“板条箱”:“2”;“5”,“up”:null,“板条箱”:“2”;“2”;“MLrate”:“34”,“Nrout”:“0”,“up”:null,“板条箱”:“4”;“MLrate”:“33”,“Nrout”:“0”,“up”:null,“板条箱”:“0”,“Nrout”:“2”,“Nrout”;“Nrout”:“0”,“Nrout”:“2”,“Nrout”;“Nrout”:“8”::“2”}]}
+---+--------------------+----+-------------+
|COL| DATA|IFAM| KTM|
+---+--------------------+----+-------------+
| 21|[[2,30,0,null], [...| EQR|1430006400000|
+---+--------------------+----+-------------+
在scala ide发生错误时,我无法理解这一点:
+---+--------------------+----+-------------+
|COL| DATA|IFAM| KTM|
+---+--------------------+----+-------------+
| 21|[[2,30,0,null], [...| EQR|1430006400000|
+---+--------------------+----+-------------+
信息共享状态:仓库路径为
'文件:/C:/Users/ben53/workspace/Demo/spark warehouse/'。例外情况
线程“main”java.util.ServiceConfigurationError:
org.apache.spark.sql.sources.DataSourceRegister:提供程序
无法实例化org.apache.spark.sql.hive.orc.DefaultSource
位于java.util.ServiceLoader.fail(未知源)
java.util.ServiceLoader.access$100(未知源代码)位于
位于的java.util.ServiceLoader$LazyIterator.nextService(未知源)
java.util.ServiceLoader$LazyIterator.next(未知源代码)位于
java.util.ServiceLoader$1.next(未知源代码)位于
scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43)
位于scala.collection.Iterator$class.foreach(Iterator.scala:893)
scala.collection.AbstractIterator.foreach(迭代器.scala:1336)位于
scala.collection.IterableLike$class.foreach(IterableLike.scala:72)位于
scala.collection.AbstractIterable.foreach(Iterable.scala:54)位于
scala.collection.TraversableLike$class.filterImpl(TraversableLike.scala:247)
在
scala.collection.TraversableLike$class.filter(TraversableLike.scala:259)
位于scala.collection.AbstractTraversable.filter(Traversable.scala:104)
在
org.apache.spark.sql.execution.datasources.DataSource$.lookUpdateSource(DataSource.scala:575)
在
org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:86)
在
org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:86)
在
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:325)
在
org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
在
org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:298)
在
org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:251)
在com.dataflair.spark.QueryLog$.main(QueryLog.scala:27)上
com.dataflair.spark.QueryLog.main(QueryLog.scala)由以下原因引起:
java.lang.VerifyError:错误的返回类型异常详细信息:位置:
org/apache/spark/sql/hive/orc/DefaultSource.createRelation(Lorg/apache/spark/sql/SQLContext;[Ljava/lang/String;Lscala/Option;Lscala/Option;Lscala/collection/immutable/Map;)Lorg/apache/spark/sql/sources/HadoopFsRelation;
@35:原因:
类型“org/apache/spark/sql/hive/orc/orcrational”(当前帧,堆栈[0])不可分配给
“org/apache/spark/sql/sources/HadoopFsRelation”(来自方法
签名)当前帧:
密件抄送:@35
标志:{}
局部变量:{'org/apache/spark/sql/hive/orc/DefaultSource','org/apache/spark/sql/SQLContext','[Ljava/lang/String;',
'scala/Option','scala/Option','scala/collection/immutable/Map'}
堆栈:{'org/apache/spark/sql/hive/orc/orcrational'}字节码:
0x0000000:b200 1c2b c100 1ebb 000e 592a b700 22b6
0x0000010:0026 bb00 2859 2c2d b200 2d19 0419 052b
0x0000020:b700 30b0
+---+--------------------+----+-------------+
|COL| DATA|IFAM| KTM|
+---+--------------------+----+-------------+
| 21|[[2,30,0,null], [...| EQR|1430006400000|
+---+--------------------+----+-------------+
位于java.lang.Class.getDeclaredConstructors0(本机方法)
位于的java.lang.Class.privateGetDeclaredConstructors(未知源)
位于的java.lang.Class.getConstructor0(未知源代码)
java.lang.Class.newInstance(未知源)…20多个
+---+--------------------+----+-------------+
|COL| DATA|IFAM| KTM|
+---+--------------------+----+-------------+
| 21|[[2,30,0,null], [...| EQR|1430006400000|
+---+--------------------+----+-------------+
我非常确定您的路径不正确。请检查文件是否位于指定的路径。Json有效。路径应该正确。但提供的Json无效。请更正示例Json,然后重试。
+---+--------------------+----+-------------+
|COL| DATA|IFAM| KTM|
+---+--------------------+----+-------------+
| 21|[[2,30,0,null], [...| EQR|1430006400000|
+---+--------------------+----+-------------+
您可以在上验证JSON
+---+--------------------+----+-------------+
|COL| DATA|IFAM| KTM|
+---+--------------------+----+-------------+
| 21|[[2,30,0,null], [...| EQR|1430006400000|
+---+--------------------+----+-------------+
它显示JSON的无效部分
+---+--------------------+----+-------------+
|COL| DATA|IFAM| KTM|
+---+--------------------+----+-------------+
| 21|[[2,30,0,null], [...| EQR|1430006400000|
+---+--------------------+----+-------------+
虽然我尝试了这个示例,但得到的结果如下:
+---+--------------------+----+-------------+
|COL| DATA|IFAM| KTM|
+---+--------------------+----+-------------+
| 21|[[2,30,0,null], [...| EQR|1430006400000|
+---+--------------------+----+-------------+
使用的代码如下:
+---+--------------------+----+-------------+
|COL| DATA|IFAM| KTM|
+---+--------------------+----+-------------+
| 21|[[2,30,0,null], [...| EQR|1430006400000|
+---+--------------------+----+-------------+
object Test {
def main(args: Array[String]) {
val sparkConf = new SparkConf().setAppName("Json Test").setMaster("local[*]")
val sc = new SparkContext(sparkConf)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
val path = "/home/test/Desktop/test.json"
val df = sqlContext.read.json(path)
df.show()
}
}
你的json有效吗?你能分享你的pom文件或sbt文件和json文件的样本吗?是的,这是一个有效的json,我使用eclipse scala ide编译和运行scala版本2.1json是正确的。我已经测试了它,并用上面的代码和json形成了数据框架。:)我更新的json是正确的,我认为管理员在json部分编辑错误,错误或者似乎不是关于json格式,而是关于spark设置。答案很有用,因为我的json有问题,我没有意识到!数据集填充为null,但现在已正确填充。