使用api newAPIHadoopFile,spark 1.2从pyspark访问ORC文件时出错

使用api newAPIHadoopFile,spark 1.2从pyspark访问ORC文件时出错,pyspark,orc,Pyspark,Orc,你能告诉我如何解决java.lang.NoSuchMethodException:org.apache.hadoop.hive.ql.io.orc.OrcStruct.() 用于启动pyspark的命令 pyspark--jars“hive-exec-0.13.1-cdh5.3.3.jar、hadoop-common-2.5.0-cdh5.3.jar、hadoop-mapreduce-client-app-2.5.0-cdh5.3.3.jar、hadoop-mapreduce-client-co

你能告诉我如何解决java.lang.NoSuchMethodException:org.apache.hadoop.hive.ql.io.orc.OrcStruct.()

用于启动pyspark的命令

pyspark--jars“hive-exec-0.13.1-cdh5.3.3.jar、hadoop-common-2.5.0-cdh5.3.jar、hadoop-mapreduce-client-app-2.5.0-cdh5.3.3.jar、hadoop-mapreduce-client-common-2.5.0-cdh5.3.3.jar、hadoop-mapreduce-client-core-2.5.0-cdh5.0-3.3.3.jar、hadoop-core-core-2.5.0-mr1-cdh5.3.3.3.3.3.3.3.3.jar、hadoop-cdh5.3.3.3.3.3.3.3.3.3.3.3.3.3

在pyspark shell中执行以下命令

distFile=sc.newAPIHadoopFile(path=“orcdatafolder/”,inputFormatClass=“org.apache.hadoop.hive.ql.io.orc.OrcNewInputFormat”,keyClass=“org.apache.hadoop.io.NullWritable”,valueClass=“org.apache.hadoop.hive.ql.io.orc.OrcStruct”)

错误:

16/07/31 19:49:53 WARN scheduler.TaskSetManager:在阶段0.0中丢失了任务0.0(TID 0,sj1dra096.corp.adobe.com):java.lang.RuntimeException:java.lang.NoSuchMethodException:org.apache.hadoop.hive.ql.io.orc.OrcStruct.() 位于org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131) 位于org.apache.hadoop.io.WritableUtils.clone(WritableUtils.java:217) 位于org.apache.spark.api.python.WritableToJavaConverter.org$apache$spark$api$python$WritableToJavaConverter$$convertWritable(pythonhadoputil.scala:96) 位于org.apache.spark.api.python.WritableToJavaConverter.convert(PythonHadoopUtil.scala:104) 位于org.apache.spark.api.python.PythonHadoopUtil$$anonfun$convertRDD$1.apply(PythonHadoopUtil.scala:183) 位于org.apache.spark.api.python.PythonHadoopUtil$$anonfun$convertRDD$1.apply(PythonHadoopUtil.scala:183) 在scala.collection.Iterator$$anon$11.next(Iterator.scala:328) 位于scala.collection.Iterator$$anon$10.next(Iterator.scala:312) 位于scala.collection.Iterator$class.foreach(Iterator.scala:727) 位于scala.collection.AbstractIterator.foreach(迭代器.scala:1157) 在scala.collection.generic.growtable$class.$plus$plus$eq(growtable.scala:48) 在scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) 在scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) 在scala.collection.TraversableOnce$class.to处(TraversableOnce.scala:273) 在scala.collection.AbstractIterator.to(Iterator.scala:1157) 在scala.collection.TraversableOnce$class.toBuffer处(TraversableOnce.scala:265) 位于scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) 位于scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) 位于scala.collection.AbstractIterator.toArray(Iterator.scala:1157) 位于org.apache.spark.rdd.rdd$$anonfun$26.apply(rdd.scala:1081) 位于org.apache.spark.rdd.rdd$$anonfun$26.apply(rdd.scala:1081) 位于org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1319) 位于org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1319) 位于org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) 位于org.apache.spark.scheduler.Task.run(Task.scala:56) 位于org.apache.spark.executor.executor$TaskRunner.run(executor.scala:196) 位于java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 位于java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 运行(Thread.java:745) 原因:java.lang.NoSuchMethodException:org.apache.hadoop.hive.ql.io.orc.OrcStruct.() 位于java.lang.Class.getConstructor0(Class.java:2849) 位于java.lang.Class.getDeclaredConstructor(Class.java:2053) 位于org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125) ... 28多