Python mrjob在hadoop中运行时永远挂起
我正在文档中运行教程,单词计数适用于本地文件,但我尝试了Python mrjob在hadoop中运行时永远挂起,python,mrjob,Python,Mrjob,我正在文档中运行教程,单词计数适用于本地文件,但我尝试了 python mr.py -r hadoop 1.txt 然后就挂了 当我用键盘中断它时,日志是: no configs found; falling back on auto-configuration no configs found; falling back on auto-configuration creating tmp directory /var/folders/zv/1hqhxh0n6m374cwzysmd
python mr.py -r hadoop 1.txt
然后就挂了
当我用键盘中断它时,日志是:
no configs found; falling back on auto-configuration
no configs found; falling back on auto-configuration
creating tmp directory /var/folders/zv/1hqhxh0n6m374cwzysmdn6zc0000gn/T/mr.yd006t.20150508.194506.047719
writing wrapper script to /var/folders/zv/1hqhxh0n6m374cwzysmdn6zc0000gn/T/mr.yd006t.20150508.194506.047719/setup-wrapper.sh
Using Hadoop version 2.7.0
Copying local files into hdfs:///user/yd006t/tmp/mrjob/mr.yd006t.20150508.194506.047719/files/
^CTraceback (most recent call last):
File "mr.py", line 16, in <module>
MRWordFrequencyCount.run()
File "/Library/Python/2.7/site-packages/mrjob/job.py", line 461, in run
mr_job.execute()
File "/Library/Python/2.7/site-packages/mrjob/job.py", line 479, in execute
super(MRJob, self).execute()
File "/Library/Python/2.7/site-packages/mrjob/launch.py", line 151, in execute
self.run_job()
File "/Library/Python/2.7/site-packages/mrjob/launch.py", line 214, in run_job
runner.run()
File "/Library/Python/2.7/site-packages/mrjob/runner.py", line 464, in run
self._run()
File "/Library/Python/2.7/site-packages/mrjob/hadoop.py", line 237, in _run
self._run_job_in_hadoop()
File "/Library/Python/2.7/site-packages/mrjob/hadoop.py", line 339, in _run_job_in_hadoop
self._process_stderr_from_streaming(master)
File "/Library/Python/2.7/site-packages/mrjob/hadoop.py", line 388, in _process_stderr_from_streaming
for line in treat_eio_as_eof(stderr):
File "/Library/Python/2.7/site-packages/mrjob/hadoop.py", line 381, in treat_eio_as_eof
yield iter.next() # okay for StopIteration to bubble up
KeyboardInterrupt
工作是处理…问题是如何解决mr.py的此帖子内容,可能是您的mr代码中的某些内容导致了此行为。事实上,我刚刚从官方文件复制了此mr.py…谢谢您的回复,我正在使用Hadoop 2.7。您让它挂起多长时间了?执行此代码需要多长时间?考虑到计算机的RAM,您确定发生的事情是意外的吗?我看到你打断它时它正在复制文件。如果这些文件很大和/或有很多,这可能需要一段时间的操作。
from mrjob.job import MRJob
class MRWordFrequencyCount(MRJob):
def mapper(self, _, line):
yield "chars", len(line)
yield "words", len(line.split())
yield "lines", 1
def reducer(self, key, values):
yield key, sum(values)
if __name__ == '__main__':
MRWordFrequencyCount.run()