Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/363.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 在Hadoop集群中运行代码时在Mapper.py和Reducer.py中遇到问题_Python_Python 3.x_Hadoop_Mapreduce_Hadoop Streaming - Fatal编程技术网

Python 在Hadoop集群中运行代码时在Mapper.py和Reducer.py中遇到问题

Python 在Hadoop集群中运行代码时在Mapper.py和Reducer.py中遇到问题,python,python-3.x,hadoop,mapreduce,hadoop-streaming,Python,Python 3.x,Hadoop,Mapreduce,Hadoop Streaming,在Hadoop集群中运行此代码以获取CSV文件中的我的数据的概率 当我在集群中运行此代码时,出现以下错误“java.lang.RuntimeException:PipeMapRed.waitOutputThreads():子进程失败,代码为1”,任何人都可以修复我的代码 #!/usr/bin/env python3 """mapper.py""" import sys # Get input lines from stdin for line in sys.stdin: # Remov

在Hadoop集群中运行此代码以获取CSV文件中的我的数据的概率

当我在集群中运行此代码时,出现以下错误“java.lang.RuntimeException:PipeMapRed.waitOutputThreads():子进程失败,代码为1”,任何人都可以修复我的代码

#!/usr/bin/env python3
"""mapper.py"""
import sys

# Get input lines from stdin
for line in sys.stdin:
    # Remove spaces from beginning and end of the line
    line = line.strip()

    # Split it into tokens
    #tokens = line.split()

    #Get probability_mass values
    for probability_mass in line:
        print(str(probability_mass)+ '\t1')

真正的误差应该在纱线UI中可用,但将概率作为键不允许一次对所有值求和,因为它们最终都会出现在不同的减速机中

如果您没有用于分组值的键,那么您可以使用它,它将所有数据汇集到一个减速器中

print(“%s\t%s%”(无,概率)

下面是您想要的输出的一个工作示例,我只使用一个输入文件而不是Hadoop对其进行了测试

import sys
from collections import defaultdict

counts = defaultdict(int)

# Get input from stdin
for line in sys.stdin:
    #Remove spaces from beginning and end of the line
    line = line.strip()

    # skip empty lines
    if not line:
        continue  

    # parse the input from mapper.py
    k,v = line.split('\t', 1)
    counts[v] += 1

total = float(sum(counts.values()))
probability_mass = {k:v/total for k,v in counts.items()}
print(probability_mass)
输出 您可以使用
cat file.txt | python mapper.py | sort-u | python reducer.py


另外,mrjob或pyspark是更高级的语言,可以提供更多有用的功能

怎么样<减速机中的code>ClassA将始终分配给该mapperit的最后一个值,这是我的错误。在最后一个示例中,它将在Reducer中打印(“%s\t%s”)%(probability\u mass,Classprob[probability\u mass])在mapreduce中不能有标题。您应该在问题中包括示例数据集(前20行)和预期输出我已附上数据集和预期输出。注意:您的问题不适合mapreduce,因为您必须提前知道所有值的总数。因此,必须从映射器输出
(无,ClassA)
。您还没有发布实际的错误,它来自于纱线用户界面。
marks
10
10
60
10
30
Expected output Probability of each number

{10: 0.6, 60: 0.2, 30: 0.2}

but result still show like this 
{1:1} {1:1} {1:1} {1:1} {1:1} {1:1}

import sys
from collections import defaultdict

counts = defaultdict(int)

# Get input from stdin
for line in sys.stdin:
    #Remove spaces from beginning and end of the line
    line = line.strip()

    # skip empty lines
    if not line:
        continue  

    # parse the input from mapper.py
    k,v = line.split('\t', 1)
    counts[v] += 1

total = float(sum(counts.values()))
probability_mass = {k:v/total for k,v in counts.items()}
print(probability_mass)
{'10': 0.6, '60': 0.2, '30': 0.2}