在hadoop上运行nutch时的EOFEException

在hadoop上运行nutch时的EOFEException,hadoop,nutch,Hadoop,Nutch,我在hadoop2.5.2和hbase 0.98.12上用gora 0.6运行nutch2.3,在执行nutch生成过程时,hadoop抛出一个EOFEException。欢迎提出任何建议 2015-05-18 15:22:06578信息[主]mapreduce.作业 (Job.java:monitorAndPrintJob(1362))-map 100%减少0%2015-05-18 15:22:13697信息[main]mapreduce.Job (Job.java:monitorAndPri

我在hadoop2.5.2和hbase 0.98.12上用gora 0.6运行nutch2.3,在执行nutch生成过程时,hadoop抛出一个EOFEException。欢迎提出任何建议

2015-05-18 15:22:06578信息[主]mapreduce.作业 (Job.java:monitorAndPrintJob(1362))-map 100%减少0%2015-05-18 15:22:13697信息[main]mapreduce.Job (Job.java:monitorAndPrintJob(1362))-地图100%减少50%2015-05-18 15:22:14720信息[main]mapreduce.Job (Job.java:printTaskEvents(1441))-任务Id: 尝试?1431932258783?0006?r?000001?0,状态:失败错误: 位于的java.io.EOFException org.apache.avro.io.BinaryDecoder.org(BinaryDecoder.java:473) 位于org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:128) 位于org.apache.avro.io.BinaryDecoder.readIndex(BinaryDecoder.java:423) 在 org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229) 位于org.apache.avro.io.parsing.Parser.advance(Parser.java:88) org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206) 在 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152) 在 org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177) 在 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148) 在 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139) 在 org.apache.hadoop.io.serializer.avro.AvroSerialization$AvroDeserializer.deserialize(AvroSerialization.java:127) 在 org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:146) 在 org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:121) 在 org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.nextKey(WrappedReducer.java:302) 位于org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:170) org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) 位于org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)位于 java.security.AccessController.doPrivileged(本机方法)位于 javax.security.auth.Subject.doAs(Subject.java:415)位于 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) 位于org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

2015-05-18 15:22:21901信息[主]mapreduce.作业 (Job.java:printTaskEvents(1441))-任务Id: 尝试?1431932258783?0006?r?000001?1,状态:失败错误: 位于的java.io.EOFException org.apache.avro.io.BinaryDecoder.org(BinaryDecoder.java:473) 位于org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:128) 位于org.apache.avro.io.BinaryDecoder.readIndex(BinaryDecoder.java:423) 在 org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229) 位于org.apache.avro.io.parsing.Parser.advance(Parser.java:88) org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206) 在 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152) 在 org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177) 在 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148) 在 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139) 在 org.apache.hadoop.io.serializer.avro.AvroSerialization$AvroDeserializer.deserialize(AvroSerialization.java:127) 在 org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:146) 在 org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:121) 在 org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.nextKey(WrappedReducer.java:302) 位于org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:170) org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) 位于org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)位于 java.security.AccessController.doPrivileged(本机方法)位于 javax.security.auth.Subject.doAs(Subject.java:415)位于 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) 位于org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

2015-05-18 15:22:28986信息[主]mapreduce.作业 (Job.java:printTaskEvents(1441))-任务Id: 尝试?1431932258783?0006?r?000001?2,状态:失败错误: 位于的java.io.EOFException org.apache.avro.io.BinaryDecoder.org(BinaryDecoder.java:473) 位于org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:128) 位于org.apache.avro.io.BinaryDecoder.readIndex(BinaryDecoder.java:423) 在 org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229) 位于org.apache.avro.io.parsing.Parser.advance(Parser.java:88) org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206) 在 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152) 在 org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177) 在 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148) 在 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139) 在 org.apache.hadoop.io.serializer.avro.AvroSerialization$AvroDeserializer.deserialize(AvroSerialization.java:127) 在 org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:146) 在 org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:121) 在 org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.nextKey(WrappedReducer.java:302) 位于org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:170) org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) 在org.apache.hadoop上
<property>
  <name>io.serializations</name>
  <value>org.apache.hadoop.io.serializer.WritableSerialization</value>
  <description>A list of serialization classes that can be used for
  obtaining serializers and deserializers.</description>
</property>
<dependency org=”org.apache.gora” name=”gora-hbase” rev=”0.6.1′′ conf=”*->default” />
<dependency org=”org.apache.solr” name=”solr-solrj” rev=”4.1.0′′ conf=”*->default” />
<dependency org=”org.apache.hbase” name=”hbase-common” rev=”0.98.8-hadoop2′′
conf=”*->default” />
gora.datastore.default=org.apache.gora.hbase.store.HBaseStore
<configuration>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:9000/hbase</value>
</property>
<!–Here you have to set the path where you want HBase to store its built in zookeeper files.–>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>hdfs://localhost:9000/zookeeper</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
</configuration>
<configuration>
<property>
<name>http.agent.name</name>
<value>NutchSpider</value>
</property><property>
<name>storage.data.store.class</name>
<value>org.apache.gora.hbase.store.HBaseStore</value>
<description>Default class for storing data</description>
</property>
<property>
<name>plugin.includes</name>
<value>protocol-http|urlfilter-regex|parse-(html|tika)|index-(basic|anchor)|indexer-solr|scoring-op
ic|urlnormalizer-(pass|regex|basic)</value>
</property>
</configuration>