Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/hadoop/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Hadoop错误“;错误:java.lang.RuntimeException:PipeMapRed.waitOutputThreads():子进程失败,代码为“1”;_Python_Hadoop_Mrjob_Bigdata - Fatal编程技术网

Python Hadoop错误“;错误:java.lang.RuntimeException:PipeMapRed.waitOutputThreads():子进程失败,代码为“1”;

Python Hadoop错误“;错误:java.lang.RuntimeException:PipeMapRed.waitOutputThreads():子进程失败,代码为“1”;,python,hadoop,mrjob,bigdata,Python,Hadoop,Mrjob,Bigdata,mapper.py工作正常。我在集群上运行了mapper.py,并将其输出存储在part-0.txt中 非常像字数计算工作,我试图计算存储在part-0.txt文件中的每个不同键的出现次数 我尝试复制粘贴此链接中的代码: 它工作了,但我无法理解它的reducer代码,所以我编写了自己的reducer 这是减速机代码: #!/usr/bin/env python from numpy import * import sys arr = [] previous_printed_word = ''

mapper.py
工作正常。我在集群上运行了
mapper.py
,并将其输出存储在
part-0.txt

非常像字数计算工作,我试图计算存储在
part-0.txt
文件中的每个不同键的出现次数

我尝试复制粘贴此链接中的代码:

它工作了,但我无法理解它的reducer代码,所以我编写了自己的reducer

这是减速机代码:

#!/usr/bin/env python
from numpy import *
import sys

arr = []
previous_printed_word = ''
#f=open('/home/nalin/Downloads/part-0.txt','r')

for line in sys.stdin:
    line = line.strip()
    current_word, current_count = line.split('\t',1)
    current_count = 0

    if(previous_printed_word != current_word):
        #f2 = open('/home/nalin/Downloads/part-0.txt', 'r')
        for line2 in sys.stdin:
            line2 = line2.strip()
            word, count2 = line2.split('\t', 1)
            count2 = int(count2)
            if current_word == word:
                current_count = current_count + count2
            else:
                continue
        print '%s\t\t\t%d' % (current_word, current_count-1)
        arr.append ( [current_word, current_count-1] )
        previous_printed_word = current_word

arr = sorted(arr, key=lambda row: row[1])
#print arr
length=len(arr)
print "LENGHT OF 2-D ARRAY IS = ",length
for i in range(1,11):
    print arr[length-i]
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
我一直收到这个错误

#!/usr/bin/env python
from numpy import *
import sys

arr = []
previous_printed_word = ''
#f=open('/home/nalin/Downloads/part-0.txt','r')

for line in sys.stdin:
    line = line.strip()
    current_word, current_count = line.split('\t',1)
    current_count = 0

    if(previous_printed_word != current_word):
        #f2 = open('/home/nalin/Downloads/part-0.txt', 'r')
        for line2 in sys.stdin:
            line2 = line2.strip()
            word, count2 = line2.split('\t', 1)
            count2 = int(count2)
            if current_word == word:
                current_count = current_count + count2
            else:
                continue
        print '%s\t\t\t%d' % (current_word, current_count-1)
        arr.append ( [current_word, current_count-1] )
        previous_printed_word = current_word

arr = sorted(arr, key=lambda row: row[1])
#print arr
length=len(arr)
print "LENGHT OF 2-D ARRAY IS = ",length
for i in range(1,11):
    print arr[length-i]
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
我试着查找这个错误的含义,我发现当代码出现问题时,就会出现这个错误

但如果我取消注释这两行:

f = open('/home/nalin/Downloads/part-0.txt', 'r')

f2 = open('/home/nalin/Downloads/part-0.txt', 'r')
f
f2
替换为
sys.stdin
(在sys.stdin的两个实例上),然后它就像一个符咒一样工作

当我在mappers输出文件上运行它时,它正在工作。 当我在集群上运行它时,它不工作


请帮助我找出代码中的错误。

如何执行作业?使用以下命令:bin/hadoop\jar/home/nalin/hadoop-2.7.3/share/hadoop/tools/lib/hadoop-streaming-2.7.3.jar \-mapper“python/home/nalin/PycharmProjects/ForHadoop/ml\u 1n\u mapper.py \-reducer”“python/home/nalin/PycharmProjects/ForHadoop/ml_1n_reducer.py”\-input”/input4/ratings.dat”\-output“wordcount12”