Python Hadoop流在输出时挂起:/path../Output
嗨,我用Python编写了两个脚本,作为Hadoop流的映射器和还原器。我运行了代码,它成功地完成了映射和缩减,两者都是100%。但它只是挂在那里的过程结束 输出如下所示:Python Hadoop流在输出时挂起:/path../Output,python,hadoop,hadoop-streaming,Python,Hadoop,Hadoop Streaming,嗨,我用Python编写了两个脚本,作为Hadoop流的映射器和还原器。我运行了代码,它成功地完成了映射和缩减,两者都是100%。但它只是挂在那里的过程结束 输出如下所示: ... 13/10/07 17:25:16 INFO streaming.StreamJob: map 99% reduce 30% 13/10/07 17:26:18 INFO streaming.StreamJob: map 99% reduce 31% 13/10/07 17:26:55 INFO stream
...
13/10/07 17:25:16 INFO streaming.StreamJob: map 99% reduce 30%
13/10/07 17:26:18 INFO streaming.StreamJob: map 99% reduce 31%
13/10/07 17:26:55 INFO streaming.StreamJob: map 99% reduce 32%
13/10/07 17:28:16 INFO streaming.StreamJob: map 100% reduce 32%
13/10/07 17:29:08 INFO streaming.StreamJob: map 100% reduce 33%
13/10/07 17:30:55 INFO streaming.StreamJob: map 100% reduce 39%
13/10/07 17:30:56 INFO streaming.StreamJob: map 100% reduce 46%
13/10/07 17:30:57 INFO streaming.StreamJob: map 100% reduce 52%
13/10/07 17:30:58 INFO streaming.StreamJob: map 100% reduce 72%
13/10/07 17:31:00 INFO streaming.StreamJob: map 100% reduce 74%
13/10/07 17:31:01 INFO streaming.StreamJob: map 100% reduce 89%
13/10/07 17:31:02 INFO streaming.StreamJob: map 100% reduce 98%
13/10/07 17:31:03 INFO streaming.StreamJob: map 100% reduce 99%
13/10/07 17:31:57 INFO streaming.StreamJob: map 100% reduce 100%
13/10/07 17:32:00 INFO streaming.StreamJob: Job complete: job_201309301959_0100
13/10/07 17:32:00 INFO streaming.StreamJob: Output: /tmp/binwang_31
我们的集群由ganglia监控,我可以清楚地看到所有节点都恢复正常,没有进行大量计算。同时,我去了hdfs,可以在那里找到我的输出。(不确定是否完整)。在我看来,整个地图还原似乎已经成功完成,但终端在最后一步挂起超过10分钟
我想知道这是怎么发生的,我应该按住CTRL+Z键停止它,还是再给它几分钟时间。任何人都知道输出是否:。。。这一步应该花那么长时间吗?
如果没有,原因可能是什么
下面是我打开另一个会话并运行命令时的响应
$ /usr/bin/hadoop job -status job_201309301959_0100
DEPRECATED: Use of this script to execute mapred command is deprecated.
Instead use the mapred command for it.
Job: job_201309301959_0100
file: hdfs://url1:8020/user/user1/.staging/job_201309301959_0100/job.xml
tracking URL: http://url1:50030/jobdetails.jsp?jobid=job_201309301959_0100
map() completion: 1.0
reduce() completion: 1.0
Counters: 34
File System Counters
FILE: Number of bytes read=232427562
FILE: Number of bytes written=835363817
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=107873895369
HDFS: Number of bytes written=51760077
HDFS: Number of read operations=1722
HDFS: Number of large read operations=0
HDFS: Number of write operations=144
Job Counters
Launched map tasks=803
Launched reduce tasks=72
Data-local map tasks=731
Rack-local map tasks=72
Total time spent by all maps in occupied slots (ms)=521490905
Total time spent by all reduces in occupied slots (ms)=47701745
Total time spent by all maps waiting after reserving slots (ms)=0
Total time spent by all reduces waiting after reserving slots (ms)=0
Map-Reduce Framework
Map input records=425093
Map output records=10311822
Map output bytes=906412336
Input split bytes=111617
Combine input records=0
Combine output records=0
Reduce input groups=550636
Reduce shuffle bytes=452246236
Reduce input records=10311822
Reduce output records=550636
Spilled Records=20623644
CPU time spent (ms)=479770510
Physical memory (bytes) snapshot=533152505856
Virtual memory (bytes) snapshot=1439405166592
Total committed heap usage (bytes)=844896337920
org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter
BYTES_READ=107742318536
提前感谢。显示您为mapper和reducer编写的代码将帮助我们帮助您。抱歉,代码太长,无法上载200多行代码。映射器的目的基本上是解析html和打印,reducer进行一些时间序列分析。你认为根据图片中的内容,我可以说工作实际上已经完成了吗?输出不完整;你错过了最后两行;最后一个应该是
BYTES\u write…
。也许是发生了什么事使这项工作悬而未决。您可以尝试再次运行它,看看它是否顺利完成。