MapOutputBuffer$Buffer.write in MapTask中的ArrayIndexOutOfBoundsException(Hadoop 2.7.1)

MapOutputBuffer$Buffer.write in MapTask中的ArrayIndexOutOfBoundsException(Hadoop 2.7.1),hadoop,Hadoop,在Hadoop 2.7.1上运行的滚烫驱动作业中,ArrayIndexOutOfBounds的情况非常奇怪。下面是Mapper日志转储。看起来赤道在溢出2中不知何故被设置为负数。这正常吗 2015-08-12 23:39:19,649 INFO [main] org.apache.hadoop.mapred.MapTask: numReduceTasks: 1 2015-08-12 23:39:20,174 INFO [main] org.apache.hadoop.mapred.MapTask

在Hadoop 2.7.1上运行的滚烫驱动作业中,ArrayIndexOutOfBounds的情况非常奇怪。下面是Mapper日志转储。看起来赤道在溢出2中不知何故被设置为负数。这正常吗

2015-08-12 23:39:19,649 INFO [main] org.apache.hadoop.mapred.MapTask: numReduceTasks: 1
2015-08-12 23:39:20,174 INFO [main] org.apache.hadoop.mapred.MapTask: (EQUATOR) 0 kvi 469762044(1879048176)
2015-08-12 23:39:20,175 INFO [main] org.apache.hadoop.mapred.MapTask: mapreduce.task.io.sort.mb: 1792
2015-08-12 23:39:20,175 INFO [main] org.apache.hadoop.mapred.MapTask: soft limit at 187904816
2015-08-12 23:39:20,175 INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 0; bufvoid = 1879048192
2015-08-12 23:39:20,175 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 469762044; length = 117440512
2015-08-12 23:39:20,214 INFO [main] org.apache.hadoop.mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2015-08-12 23:39:20,216 INFO [main] cascading.flow.hadoop.FlowMapper: cascading version: 2.6.1
2015-08-12 23:39:20,216 INFO [main] cascading.flow.hadoop.FlowMapper: child jvm opts: -Xmx1024m -Djava.io.tmpdir=./tmp
2015-08-12 23:39:20,516 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
2015-08-12 23:39:20,552 INFO [main] cascading.flow.hadoop.FlowMapper: sourcing from: TempHfs["SequenceFile[['docId', 'otherDocId', 'score']]"][9909013673/_pipe_11__pipe_12/]
2015-08-12 23:39:20,552 INFO [main] cascading.flow.hadoop.FlowMapper: sinking to: GroupBy(_pipe_11+_pipe_12)[by:[
{1}
:'docId']]
2015-08-12 23:39:29,424 INFO [main] org.apache.hadoop.mapred.MapTask: Spilling map output
2015-08-12 23:39:29,424 INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 0; bufend = 108647886; bufvoid = 1879048192
2015-08-12 23:39:29,424 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 469762044(1879048176); kvend = 449947816(1799791264); length = 19814229/117440512
2015-08-12 23:39:29,425 INFO [main] org.apache.hadoop.mapred.MapTask: (EQUATOR) 839953118 kvi 209988272(839953088)
2015-08-12 23:39:43,985 INFO [SpillThread] org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor [.gz]
2015-08-12 23:39:46,767 INFO [SpillThread] org.apache.hadoop.mapred.MapTask: Finished spill 0
2015-08-12 23:39:46,767 INFO [main] org.apache.hadoop.mapred.MapTask: (RESET) equator 839953118 kv 209988272(839953088) kvi 178264648(713058592)
2015-08-12 23:39:46,767 INFO [main] org.apache.hadoop.mapred.MapTask: Spilling map output
2015-08-12 23:39:46,767 INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 839953118; bufend = 1014433072; bufvoid = 1879048192
2015-08-12 23:39:46,767 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 209988272(839953088); kvend = 178264648(713058592); length = 31723625/117440512
2015-08-12 23:39:46,767 INFO [main] org.apache.hadoop.mapred.MapTask: (EQUATOR) 1696670336 kvi 424167580(1696670320)
2015-08-12 23:40:22,641 INFO [SpillThread] org.apache.hadoop.mapred.MapTask: Finished spill 1
2015-08-12 23:40:22,641 INFO [main] org.apache.hadoop.mapred.MapTask: (RESET) equator 1696670336 kv 424167580(1696670320) kvi 392768808(1571075232)
2015-08-12 23:40:22,641 INFO [main] org.apache.hadoop.mapred.MapTask: Spilling map output
2015-08-12 23:40:22,641 INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 1696670336; bufend = 1869363604; bufvoid = 1879048192
2015-08-12 23:40:22,641 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 424167580(1696670320); kvend = 392768808(1571075232); length = 31398773/117440512
2015-08-12 23:40:22,642 INFO [main] org.apache.hadoop.mapred.MapTask: (EQUATOR) -1742031900 kvi 34254072(137016288)
2015-08-12 23:40:47,329 INFO [SpillThread] org.apache.hadoop.mapred.MapTask: Finished spill 2
2015-08-12 23:40:47,330 INFO [main] org.apache.hadoop.mapred.MapTask: (RESET) equator -1742031900 kv 34254072(137016288) kvi 34254072(137016288)
2015-08-12 23:40:47,331 ERROR [main] cascading.flow.stream.TrapHandler: caught Throwable, no trap available, rethrowing
cascading.flow.stream.DuctException: internal error: ['7541904654925238223', '2.812180059539485']
at cascading.flow.hadoop.stream.HadoopGroupByGate.receive(HadoopGroupByGate.java:81)
at cascading.flow.hadoop.stream.HadoopGroupByGate.receive(HadoopGroupByGate.java:37)
at cascading.flow.stream.FunctionEachStage$1.collect(FunctionEachStage.java:80)
at cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:145)
at cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:133)
at cascading.operation.Identity$2.operate(Identity.java:137)
at cascading.operation.Identity.operate(Identity.java:150)
at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:99)
at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:39)
at cascading.flow.stream.SourceStage.map(SourceStage.java:102)
at cascading.flow.stream.SourceStage.run(SourceStage.java:58)
at cascading.flow.hadoop.FlowMapper.run(FlowMapper.java:130)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.ArrayIndexOutOfBoundsException
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1453)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1349)
at java.io.DataOutputStream.write(DataOutputStream.java:88)
at java.io.DataOutputStream.writeByte(DataOutputStream.java:153)
at org.apache.hadoop.io.WritableUtils.writeVLong(WritableUtils.java:273)
at org.apache.hadoop.io.WritableUtils.writeVInt(WritableUtils.java:253)
at cascading.tuple.hadoop.io.HadoopTupleOutputStream.writeIntInternal(HadoopTupleOutputStream.java:155)
at cascading.tuple.io.TupleOutputStream.write(TupleOutputStream.java:86)
at cascading.tuple.io.TupleOutputStream.writeTuple(TupleOutputStream.java:64)
at cascading.tuple.hadoop.io.TupleSerializer.serialize(TupleSerializer.java:37)
at cascading.tuple.hadoop.io.TupleSerializer.serialize(TupleSerializer.java:28)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1149)
at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:610)
at cascading.tap.hadoop.util.MeasuredOutputCollector.collect(MeasuredOutputCollector.java:69)
at cascading.flow.hadoop.stream.HadoopGroupByGate.receive(HadoopGroupByGate.java:68)
... 18 more

我怀疑有线程问题,所以我尝试了下面的方法,它成功了。不确定治疗是否有效

<property> 
<name>mapreduce.map.sort.spill.percent</name> 
<value>0.8</value> 
</property> 

<property> 
<name>mapreduce.task.io.sort.factor</name> 
<value>10</value> 
</property> 

<property> 
<name>mapreduce.task.io.sort.mb</name> 
<value>100</value> 
</property> 

<property> 
<name>mapred.map.multithreadedrunner.threads</name> 
<value>1</value> 
</property> 

<property> 
<name>mapreduce.mapper.multithreadedmapper.threads</name> 
<value>1</value> 
</property> 

mapreduce.map.sort.spill.percent
0.8
mapreduce.task.io.sort.factor
10
mapreduce.task.io.sort.mb
100
mapred.map.multi-threadedRunner.threads
1.
mapreduce.mapper.multithreadedmapper.threads
1.

是mapreduce.task.io.sort.mb起了作用。当设置为2G或更大时,它会不断遇到问题。 建议设置为以下值或更小值:

Dmapreduce.task.io.sort.mb=1792

对于未来的读者:尝试将“mapreduce.task.io.sort.mb”减少到更小的值。我在Hadoop中也遇到了同样的问题,但我的工作是从pig开始的。很奇怪。我将尝试减少排序内存,正如您所说的@minutis实际上,当合并两个已排序的部分时,我的堆栈看起来不像是在排序完成后出现的
org.apache.hadoop.mapred.merge:合并2个已排序的段。
org.apache.hadoop.mapred.merge:向下至最后一个合并过程,总大小剩下2段:116274字节
mapred.YarnChild:Exception running child:java.lang.ArrayIndexOutOfBoundsException您能解释一下您在属性中设置的值以及您对结果的怀疑影响吗?我查了所有的数据,知道这些值意味着什么——但哪些变化以及为什么会得到结果?