Java Hadoop上一个映射作业卡住了-需要帮助吗

Java Hadoop上一个映射作业卡住了-需要帮助吗,java,algorithm,hadoop,distributed,mapreduce,Java,Algorithm,Hadoop,Distributed,Mapreduce,我正在使用hadoop map reduce作业进行一些文本处理。我的工作完成了99.2%,停留在上一份地图工作上 地图输出的最后几行显示如下。上一次,当这个问题发生时,我试着从map中打印出键值,注意到其中一个键值有大量与之相关的值,我想,它在排序这些值时似乎卡住了。然后,我停止了从地图作业中提取钥匙的工作,效果很好 我认为,同样的问题再次出现,打印出键值对是一项乏味的工作,因为这项工作需要时间。有更好的选择吗?就像配置hadoop,如果它们在排序上花费了太多时间,就可以忘记几个键。有这样的东

我正在使用hadoop map reduce作业进行一些文本处理。我的工作完成了99.2%,停留在上一份地图工作上

地图输出的最后几行显示如下。上一次,当这个问题发生时,我试着从map中打印出键值,注意到其中一个键值有大量与之相关的值,我想,它在排序这些值时似乎卡住了。然后,我停止了从地图作业中提取钥匙的工作,效果很好

我认为,同样的问题再次出现,打印出键值对是一项乏味的工作,因为这项工作需要时间。有更好的选择吗?就像配置hadoop,如果它们在排序上花费了太多时间,就可以忘记几个键。有这样的东西吗

2010-10-20 14:43:32,274 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: buffer full= true 2010-10-20 14:43:32,274 INFO org.apache.hadoop.mapred.MapTask: bufstart = 0; bufend = 79698262; bufvoid = 99614720 2010-10-20 14:43:32,274 INFO org.apache.hadoop.mapred.MapTask: kvstart = 0; kvend = 6601; length = 327680 2010-10-20 14:43:33,272 INFO org.apache.hadoop.mapred.MapTask: Finished spill 0 2010-10-20 14:50:44,113 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: buffer full= true 2010-10-20 14:50:44,113 INFO org.apache.hadoop.mapred.MapTask: bufstart = 79698262; bufend = 59800449; bufvoid = 99614720 2010-10-20 14:50:44,113 INFO org.apache.hadoop.mapred.MapTask: kvstart = 6601; kvend = 9039; length = 327680 2010-10-20 14:50:44,864 INFO org.apache.hadoop.mapred.MapTask: Finished spill 1 2010-10-20 14:58:33,105 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: buffer full= true 2010-10-20 14:58:33,105 INFO org.apache.hadoop.mapred.MapTask: bufstart = 59800449; bufend = 39893455; bufvoid = 99614720 2010-10-20 14:58:33,105 INFO org.apache.hadoop.mapred.MapTask: kvstart = 9039; kvend = 11228; length = 327680 2010-10-20 14:58:33,817 INFO org.apache.hadoop.mapred.MapTask: Finished spill 2 2010-10-20 15:06:48,675 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: buffer full= true 2010-10-20 15:06:48,675 INFO org.apache.hadoop.mapred.MapTask: bufstart = 39893455; bufend = 20000988; bufvoid = 99614720 2010-10-20 15:06:48,675 INFO org.apache.hadoop.mapred.MapTask: kvstart = 11228; kvend = 13286; length = 327680 2010-10-20 15:06:49,395 INFO org.apache.hadoop.mapred.MapTask: Finished spill 3 2010-10-20 15:15:23,514 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: buffer full= true 2010-10-20 15:15:23,514 INFO org.apache.hadoop.mapred.MapTask: bufstart = 20000988; bufend = 78879; bufvoid = 99614720 2010-10-20 15:15:23,514 INFO org.apache.hadoop.mapred.MapTask: kvstart = 13286; kvend = 15265; length = 327680 2010-10-20 15:15:24,230 INFO org.apache.hadoop.mapred.MapTask: Finished spill 4 2010-10-20 15:24:35,797 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: buffer full= true 2010-10-20 15:24:35,797 INFO org.apache.hadoop.mapred.MapTask: bufstart = 78879; bufend = 79807573; bufvoid = 99614720 2010-10-20 15:24:35,797 INFO org.apache.hadoop.mapred.MapTask: kvstart = 15265; kvend = 17188; length = 327680 2010-10-20 15:24:36,500 INFO org.apache.hadoop.mapred.MapTask: Finished spill 5 2010-10-20 15:33:33,391 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: buffer full= true 2010-10-20 15:33:33,391 INFO org.apache.hadoop.mapred.MapTask: bufstart = 79807573; bufend = 59907680; bufvoid = 99614720 2010-10-20 15:33:33,391 INFO org.apache.hadoop.mapred.MapTask: kvstart = 17188; kvend = 19074; length = 327680 2010-10-20 15:33:34,114 INFO org.apache.hadoop.mapred.MapTask: Finished spill 6 2010-10-20 15:42:39,913 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: buffer full= true 2010-10-20 15:42:39,913 INFO org.apache.hadoop.mapred.MapTask: bufstart = 59907680; bufend = 40011208; bufvoid = 99614720 2010-10-20 15:42:39,913 INFO org.apache.hadoop.mapred.MapTask: kvstart = 19074; kvend = 20926; length = 327680 2010-10-20 15:42:40,597 INFO org.apache.hadoop.mapred.MapTask: Finished spill 7 2010-10-20 15:51:49,668 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: buffer full= true 2010-10-20 15:51:49,668 INFO org.apache.hadoop.mapred.MapTask: bufstart = 40011208; bufend = 20111383; bufvoid = 99614720 2010-10-20 15:51:49,668 INFO org.apache.hadoop.mapred.MapTask: kvstart = 20926; kvend = 22759; length = 327680 2010-10-20 15:51:50,378 INFO org.apache.hadoop.mapred.MapTask: Finished spill 8 2010-10-20 16:01:05,893 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: buffer full= true 2010-10-20 16:01:05,893 INFO org.apache.hadoop.mapred.MapTask: bufstart = 20111383; bufend = 196929; bufvoid = 99614720 2010-10-20 16:01:05,894 INFO org.apache.hadoop.mapred.MapTask: kvstart = 22759; kvend = 24572; length = 327680 2010-10-20 16:01:06,634 INFO org.apache.hadoop.mapred.MapTask: Finished spill 9 2010-10-20 16:10:25,000 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: buffer full= true 2010-10-20 16:10:25,000 INFO org.apache.hadoop.mapred.MapTask: bufstart = 196929; bufend = 79900267; bufvoid = 99614720 2010-10-20 16:10:25,000 INFO org.apache.hadoop.mapred.MapTask: kvstart = 24572; kvend = 26370; length = 327680 2010-10-20 16:10:25,776 INFO org.apache.hadoop.mapred.MapTask: Finished spill 10 2010-10-20 16:19:48,283 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: buffer full= true 2010-10-20 16:19:48,283 INFO org.apache.hadoop.mapred.MapTask: bufstart = 79900267; bufend = 59993676; bufvoid = 99614720 2010-10-20 16:19:48,284 INFO org.apache.hadoop.mapred.MapTask: kvstart = 26370; kvend = 28152; length = 327680 2010-10-20 16:19:49,042 INFO org.apache.hadoop.mapred.MapTask: Finished spill 11 2010-10-20 14:43:32274 INFO org.apache.hadoop.mapred.MapTask:溢出映射输出:buffer full=true 2010-10-20 14:43:32274 INFO org.apache.hadoop.mapred.MapTask:bufstart=0;bufend=79698262;bufvoid=99614720 2010-10-20 14:43:32274 INFO org.apache.hadoop.mapred.MapTask:kvstart=0;kvend=6601;长度=327680 2010-10-20 14:43:33272 INFO org.apache.hadoop.mapred.MapTask:已完成溢出0 2010-10-20 14:50:44113 INFO org.apache.hadoop.mapred.MapTask:溢出映射输出:buffer full=true 2010-10-20 14:50:44113 INFO org.apache.hadoop.mapred.MapTask:bufstart=79698262;bufend=59800449;bufvoid=99614720 2010-10-20 14:50:44113 INFO org.apache.hadoop.mapred.MapTask:kvstart=6601;kvend=9039;长度=327680 2010-10-20 14:50:44864 INFO org.apache.hadoop.mapred.MapTask:Finished spill 1 2010-10-20 14:58:33105 INFO org.apache.hadoop.mapred.MapTask:溢出映射输出:buffer full=true 2010-10-20 14:58:33105 INFO org.apache.hadoop.mapred.MapTask:bufstart=59800449;bufend=39893455;bufvoid=99614720 2010-10-20 14:58:33105 INFO org.apache.hadoop.mapred.MapTask:kvstart=9039;kvend=11228;长度=327680 2010-10-20 14:58:33817 INFO org.apache.hadoop.mapred.MapTask:Finished spill 2 2010-10-20 15:06:48675 INFO org.apache.hadoop.mapred.MapTask:溢出映射输出:buffer full=true 2010-10-20 15:06:48675 INFO org.apache.hadoop.mapred.MapTask:bufstart=39893455;bufend=20000988;bufvoid=99614720 2010-10-20 15:06:48675 INFO org.apache.hadoop.mapred.MapTask:kvstart=11228;kvend=13286;长度=327680 2010-10-20 15:06:49395 INFO org.apache.hadoop.mapred.MapTask:Finished spill 3 2010-10-20 15:15:23514 INFO org.apache.hadoop.mapred.MapTask:溢出映射输出:buffer full=true 2010-10-20 15:15:23514 INFO org.apache.hadoop.mapred.MapTask:bufstart=20000988;bufend=78879;bufvoid=99614720 2010-10-20 15:15:23514 INFO org.apache.hadoop.mapred.MapTask:kvstart=13286;kvend=15265;长度=327680 2010-10-20 15:15:24230 INFO org.apache.hadoop.mapred.MapTask:Finished spill 4 2010-10-20 15:24:35797 INFO org.apache.hadoop.mapred.MapTask:溢出映射输出:buffer full=true 2010-10-20 15:24:35797 INFO org.apache.hadoop.mapred.MapTask:bufstart=78879;bufend=79807573;bufvoid=99614720 2010-10-20 15:24:35797 INFO org.apache.hadoop.mapred.MapTask:kvstart=15265;kvend=17188;长度=327680 2010-10-20 15:24:36500 INFO org.apache.hadoop.mapred.MapTask:Finished spill 5 2010-10-20 15:33:33391 INFO org.apache.hadoop.mapred.MapTask:溢出映射输出:buffer full=true 2010-10-20 15:33:33391 INFO org.apache.hadoop.mapred.MapTask:bufstart=79807573;bufend=59907680;bufvoid=99614720 2010-10-20 15:33:33391 INFO org.apache.hadoop.mapred.MapTask:kvstart=17188;kvend=19074;长度=327680 2010-10-20 15:33:34114 INFO org.apache.hadoop.mapred.MapTask:Finished spill 6 2010-10-20 15:42:39913 INFO org.apache.hadoop.mapred.MapTask:溢出映射输出:buffer full=true 2010-10-20 15:42:39913 INFO org.apache.hadoop.mapred.MapTask:bufstart=59907680;bufend=40011208;bufvoid=99614720 2010-10-20 15:42:39913 INFO org.apache.hadoop.mapred.MapTask:kvstart=19074;kvend=20926;长度=327680 2010-10-20 15:42:40597 INFO org.apache.hadoop.mapred.MapTask:Finished spill 7 2010-10-20 15:51:49668 INFO org.apache.hadoop.mapred.MapTask:溢出映射输出:buffer full=true 2010-10-20 15:51:49668 INFO org.apache.hadoop.mapred.MapTask:bufstart=40011208;bufend=20111383;bufvoid=99614720 2010-10-20 15:51:49668 INFO org.apache.hadoop.mapred.MapTask:kvstart=20926;kvend=22759;长度=327680 2010-10-20 15:51:50378 INFO org.apache.hadoop.mapred.MapTask:Finished spill 8 2010-10-20 16:01:05893 INFO org.apache.hadoop.mapred.MapTask:溢出映射输出:buffer full=true 2010-10-20 16:01:05893 INFO org.apache.hadoop.mapred.MapTask:bufstart=20111383;bufend=196929;bufvoid=99614720 2010-10-20 16:01:05894 INFO org.apache.hadoop.mapred.MapTask:kvstart=22759;kvend=24572;长度=327680 2010-10-20 16:01:06634 INFO org.apache.hadoop.mapred.MapTask:Finished spill 9 2010-10-20 16:10:25000 INFO org.apache.hadoop.mapred.MapTask:溢出映射输出:buffer full=true 2010-10-20 16:10:25000 INFO org.apache.hadoop.mapred.MapTask:bufstart=196929;bufend=79900267;bufvoid=99614720 2010-10-20 16:10:25000 INFO org.apache.hadoop.mapred.MapTask:kvstart=24572;kvend=26370;长度=327680 2010-10-20 16:10:25776 INFO org.apache.hadoop.mapred.MapTask:Finished spill 10 2010-10-20 16:19:48283 INFO org.apache.hadoop.mapred.MapTask:溢出映射输出:buffer full=true 2010-10-20 16:19:48283 INFO org.apache.hadoop.mapred.MapTask:bufstart=79900267;bufend=59993676;bufvoid=99614720 2010-10-20 16:19:48284 INFO org.apache.hadoop.mapred.MapTask:kvstart=26370;kvend=28152;长度=327680 2010-10-20 16:19:49042 INFO org.apache.hadoop.mapred.MapTask:Finished spill 11
谢谢

Hadoop中没有任何东西会知道map()的特定调用会发出过多的键值对。我猜在map()函数中有一种循环会发出这些键-