elasticsearch,pyspark,Hadoop,elasticsearch,Pyspark" /> elasticsearch,pyspark,Hadoop,elasticsearch,Pyspark" />

使用pyspark和elasticsearch hadoop连接器的elasticsearch查询在RecordReader.close中引发异常

使用pyspark和elasticsearch hadoop连接器的elasticsearch查询在RecordReader.close中引发异常,hadoop,elasticsearch,pyspark,Hadoop,elasticsearch,Pyspark,从elasticsearch读取rdd引发异常: org.elasticsearch.hadoop.rest.eshadopinvalidRequest:ActionRequestValidationException[验证失败:1:未指定滚动ID;] 及 mr.EsInputFormat:无法确定任务id 软件版本:pyspark 1.6、elasticsearch-hadoop-2.2.1连接器用作elasticsearch的连接器、elasticsearch版本为1.0.1、hadoop

从elasticsearch读取rdd引发异常:

org.elasticsearch.hadoop.rest.eshadopinvalidRequest:ActionRequestValidationException[验证失败:1:未指定滚动ID;]

mr.EsInputFormat:无法确定任务id

软件版本:pyspark 1.6、elasticsearch-hadoop-2.2.1连接器用作elasticsearch的连接器、elasticsearch版本为1.0.1、hadoop 2.7.2和python 2.7

elasticsearch-hadoop-2.2.1库取自此处:

代码

es_rdd =  sc.newAPIHadoopRDD(
inputFormatClass="org.elasticsearch.hadoop.mr.EsInputFormat",
keyClass="org.apache.hadoop.io.NullWritable",
valueClass="org.elasticsearch.hadoop.mr.LinkedMapWritable",
conf={ "es.resource" : "INDEX/TYPE", "es.nodes" : "NODE_NAME"})
print (es_rdd.first())
请帮助解决异常

以下警告打印在异常之前,可能与此警告以及潜在的实际异常相关联: mr.EsInputFormat:无法确定任务id

INFO Configuration.deprecation:mapred.tip.id已弃用。而是使用mapreduce.task.id

完全异常

es_rdd =  sc.newAPIHadoopRDD(
inputFormatClass="org.elasticsearch.hadoop.mr.EsInputFormat",
keyClass="org.apache.hadoop.io.NullWritable",
valueClass="org.elasticsearch.hadoop.mr.LinkedMapWritable",
conf={ "es.resource" : "INDEX/TYPE", "es.nodes" : "NODE_NAME"})
print (es_rdd.first())
16/04/26 21:00:02 INFO-rdd.NewHadoopRDD:Input-split:shardinputslit[node=[KHHV8pgMQySzw9Fz1Xt7VQ/Iguana | 135.17.42.49:9200],shard=0] 16/04/26 21:00:02信息配置。弃用:mapred.mapoutput.value.class已弃用。而是使用mapreduce.map.output.value.class 16/04/26 21:00:02信息配置。弃用:mapred.task.id已弃用。而是使用mapreduce.task.trunt.id 16/04/26 21:00:02信息配置。弃用:mapred.tip.id已弃用。而是使用mapreduce.task.id

16/04/26 19:31:12警告EsInputFormat先生:无法确定任务id。。。 16/04/26 19:31:14警告rdd.NewHadoopRDD:RecordReader.close()中出现异常 org.elasticsearch.hadoop.rest.eshadopinvalidRequest:ActionRequestValidationException[验证失败:1:未指定滚动ID;] 位于org.elasticsearch.hadoop.rest.RestClient.checkResponse(RestClient.java:478) 位于org.elasticsearch.hadoop.rest.RestClient.executeNotFoundAllowed(RestClient.java:449) 位于org.elasticsearch.hadoop.rest.RestClient.deleteColl(RestClient.java:512) 在org.elasticsearch.hadoop.rest.ScrollQuery.close(ScrollQuery.java:70)上 请访问org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.close(EsInputFormat.java:262) 在org.apache.spark.rdd.NewHadoopRDD$$anon$1.org$apache$spark$rdd$NewHadoopRDD$$anon$$close(NewHadoopRDD.scala:191) 位于org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:166) 在org.apache.spark.interruptblediator.hasNext(interruptblediator.scala:39) 在scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) 在scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) 位于org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.next(SerDeUtil.scala:118) 位于org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.next(SerDeUtil.scala:110) 位于scala.collection.Iterator$class.foreach(Iterator.scala:727) 位于org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.foreach(SerDeUtil.scala:110) 位于org.apache.spark.api.PythonRDD$.writeiteiteratortostream(PythonRDD.scala:452) 位于org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:280) 位于org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1741) 位于org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:239)

谢谢大家!