hadoop流错误,使用python的mapreduce

hadoop流错误,使用python的mapreduce,hadoop,mapreduce,hadoop-streaming,Hadoop,Mapreduce,Hadoop Streaming,我是hadoop环境的新手,您是否知道如何解决此错误,或者此错误背后的原因是什么 hduser@intel-HP-Pavilion-g6-Notebook-PC:~/hduser/hadoop$ sudo ./bin/hadoop jar contrib/streaming/hadoop-streaming-1.0.4.jar -file /home/hduser/map.py -mapper /home/hduser/map.py -file /home/hduser/red.py -re

我是hadoop环境的新手,您是否知道如何解决此错误,或者此错误背后的原因是什么

hduser@intel-HP-Pavilion-g6-Notebook-PC:~/hduser/hadoop$ sudo ./bin/hadoop jar contrib/streaming/hadoop-streaming-1.0.4.jar  -file /home/hduser/map.py  -mapper /home/hduser/map.py -file /home/hduser/red.py -reducer /home/hduser/red.py  -input /home/hduser/tmp/cddb.txt  -output /home/hduser/op1
packageJobJar: [/home/hduser/map.py, /home/hduser/red.py] [] /tmp/streamjob7455767556382290755.jar tmpDir=null
13/06/20 12:43:55 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/06/20 12:43:55 WARN snappy.LoadSnappy: Snappy native library not loaded
13/06/20 12:43:55 INFO mapred.FileInputFormat: Total input paths to process : 1
13/06/20 12:43:55 WARN mapred.LocalJobRunner: LocalJobRunner does not support symlinking into current working dir.
13/06/20 12:43:56 INFO streaming.StreamJob: getLocalDirs(): [/tmp/hadoop-root/mapred/local]
13/06/20 12:43:56 INFO streaming.StreamJob: Running job: job_local_0001
13/06/20 12:43:56 INFO streaming.StreamJob: Job running in-process (local Hadoop)
13/06/20 12:43:56 INFO util.ProcessTree: setsid exited with exit code 0
13/06/20 12:43:56 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@e2081
13/06/20 12:43:56 INFO mapred.MapTask: numReduceTasks: 1
13/06/20 12:43:56 INFO mapred.MapTask: io.sort.mb = 100
13/06/20 12:43:56 INFO mapred.MapTask: data buffer = 79691776/99614720
13/06/20 12:43:56 INFO mapred.MapTask: record buffer = 262144/327680
13/06/20 12:43:56 INFO streaming.PipeMapRed: PipeMapRed exec [/home/hduser/hduser/hadoop/./map.py]
13/06/20 12:43:56 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
13/06/20 12:43:57 INFO streaming.StreamJob:  map 0%  reduce 0%
13/06/20 12:44:02 INFO mapred.LocalJobRunner: file:/home/hduser/tmp/cddb.txt:0+1205
13/06/20 12:44:03 INFO streaming.StreamJob:  map 100%  reduce 0%
13/06/20 12:48:11 INFO streaming.PipeMapRed: Records R/W=9/1
13/06/20 12:48:11 INFO streaming.PipeMapRed: MRErrorThread done
13/06/20 12:48:11 INFO streaming.PipeMapRed: mapRedFinished
13/06/20 12:48:11 INFO mapred.MapTask: Starting flush of map output
13/06/20 12:48:11 INFO mapred.MapTask: Finished spill 0
13/06/20 12:48:11 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
13/06/20 12:48:11 INFO mapred.LocalJobRunner: Records R/W=9/1
13/06/20 12:48:11 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done.
13/06/20 12:48:11 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1c84be9
13/06/20 12:48:11 INFO mapred.LocalJobRunner:
13/06/20 12:48:11 INFO mapred.Merger: Merging 1 sorted segments
13/06/20 12:48:11 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 1356 bytes
13/06/20 12:48:11 INFO mapred.LocalJobRunner:
13/06/20 12:48:11 INFO streaming.PipeMapRed: PipeMapRed exec [/home/hduser/hduser/hadoop/./red.py]
13/06/20 12:48:11 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
13/06/20 12:48:11 INFO streaming.PipeMapRed: R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s]
Traceback (most recent call last):
  File "/home/hduser/hduser/hadoop/./red.py", line 30, in <module>
    main()
  File "/home/hduser/hduser/hadoop/./red.py", line 19, in main
    for similarity, group in groupby(data, itemgetter(0), reverse=True):
TypeError: groupby() takes at most 2 arguments (3 given)
13/06/20 12:48:11 INFO streaming.PipeMapRed: MRErrorThread done
13/06/20 12:48:11 INFO streaming.PipeMapRed: PipeMapRed failed!
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:576)
    at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:529)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
13/06/20 12:48:11 WARN mapred.LocalJobRunner: job_local_0001
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:576)
    at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:529)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
13/06/20 12:48:12 INFO streaming.StreamJob: Job running in-process (local Hadoop)
13/06/20 12:48:12 ERROR streaming.StreamJob: Job not successful. Error: NA
13/06/20 12:48:12 INFO streaming.StreamJob: killJob...
Streaming Command Failed!
hduser@intel-HP-Pavilion-g6-Notebook-PC:~/hduser/hadoop$sudo./bin/hadoop-jar contrib/streaming/hadoop-streaming-1.0.4.jar-file/home/hduser/map.py-mapper/home/hduser/map.py-file/home/hduser/red.py-reducer/home/hduser/red.py-input/home/hduser/tmp/cddb.txt-output/home/hduser/op1
packageJobJar:[/home/hduser/map.py,/home/hduser/red.py][]/tmp/streamjob7455767556382290755.jar tmpDir=null
13/06/20 12:43:55 INFO util.NativeCodeLoader:加载了本机hadoop库
13/06/20 12:43:55警告snappy.LoadSnappy:snappy本机库未加载
2013/06/20 12:43:55信息映射。文件输入格式:要处理的总输入路径:1
13/06/20 12:43:55警告映射。LocalJobRunner:LocalJobRunner不支持符号链接到当前工作目录。
13/06/20 12:43:56 INFO streaming.StreamJob:getLocalDirs():[/tmp/hadoop root/mapred/local]
13/06/20 12:43:56信息流。流作业:正在运行作业:作业\u本地\u 0001
13/06/20 12:43:56 INFO streaming.StreamJob:进程内运行的作业(本地Hadoop)
20年6月13日12:43:56信息util.ProcessTree:setsid已退出,退出代码为0
13/06/20 12:43:56信息映射。任务:使用ResourceCalculatorPlugin:org.apache.hadoop.util。LinuxResourceCalculatorPlugin@e2081
2013/06/20 12:43:56信息映射。映射任务:numReduceTasks:1
2013/06/20 12:43:56 INFO mapred.MapTask:io.sort.mb=100
20年6月13日12:43:56信息映射任务:数据缓冲区=79691776/99614720
2013/06/20 12:43:56 INFO mapred.MapTask:record buffer=262144/327680
13/06/20 12:43:56 INFO streaming.PipeMapRed:PipeMapRed exec[/home/hduser/hduser/hadoop//map.py]
2013/06/20 12:43:56 INFO streaming.PipeMapRed:R/W/S=1/0/0 in:NA[rec/S]out:NA[rec/S]
13/06/20 12:43:57信息流。StreamJob:映射0%减少0%
13/06/20 12:44:02 INFO mapred.LocalJobRunner:file:/home/hduser/tmp/cddb.txt:0+1205
13/06/20 12:44:03信息流。流作业:映射100%减少0%
13/06/20 12:48:11信息流。管道映射:记录R/W=9/1
13/06/20 12:48:11信息流。管道映射红色:MRErrorThread完成
13/06/20 12:48:11信息流。PipeMapRed:mapRedFinished
13/06/20 12:48:11信息映射。映射任务:开始刷新映射输出
13/06/20 12:48:11信息映射。映射任务:已完成溢出0
13/06/20 12:48:11信息映射。任务:任务:尝试本地\u 0001\u m\u000000\u 0已完成。并且正在提交
13/06/20 12:48:11信息映射。本地JobRunner:记录R/W=9/1
13/06/20 12:48:11信息映射。任务:任务“尝试本地”0001\m\u000000\u 0完成。
13/06/20 12:48:11信息映射。任务:使用ResourceCalculatorPlugin:org.apache.hadoop.util。LinuxResourceCalculatorPlugin@1c84be9
13/06/20 12:48:11 INFO mapred.LocalJobRunner:
13/06/20 12:48:11信息映射。合并:合并1个已排序的段
13/06/20 12:48:11信息映射。合并:向下至最后一个合并过程,总大小剩余1段:1356字节
13/06/20 12:48:11 INFO mapred.LocalJobRunner:
13/06/20 12:48:11 INFO streaming.PipeMapRed:PipeMapRed exec[/home/hduser/hduser/hadoop//red.py]
13/06/20 12:48:11信息流。管道映射红色:R/W/S=1/0/0输入:不适用[rec/S]输出:不适用[rec/S]
13/06/20 12:48:11信息流。管道映射红色:R/W/S=10/0/0输入:不适用[rec/S]输出:不适用[rec/S]
回溯(最近一次呼叫最后一次):
文件“/home/hduser/hduser/hadoop//red.py”,第30行,在
main()
文件“/home/hduser/hduser/hadoop//red.py”,第19行,在main中
对于相似性,在groupby中分组(数据,itemgetter(0),reverse=True):
TypeError:groupby()最多接受2个参数(给定3个)
13/06/20 12:48:11信息流。管道映射红色:MRErrorThread完成
13/06/20 12:48:11信息流。PipeMapRed:PipeMapRed失败!
java.lang.RuntimeException:PipeMapRed.waitOutputThreads():子进程失败,代码为1
位于org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
在org.apache.hadoop.streaming.PipeMapRed.mapRedFinished上(PipeMapRed.java:576)
位于org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137)
位于org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:529)
位于org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
位于org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
13/06/20 12:48:11警告映射。本地JobRunner:job\u local\u 0001
java.lang.RuntimeException:PipeMapRed.waitOutputThreads():子进程失败,代码为1
位于org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
在org.apache.hadoop.streaming.PipeMapRed.mapRedFinished上(PipeMapRed.java:576)
位于org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137)
位于org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:529)
位于org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
位于org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
13/06/20 12:48:12 INFO streaming.StreamJob:进程内运行的作业(本地Hadoop)
13/06/20 12:48:12错误流。StreamJob:作业未成功。错误:NA
13/06/20 12:48:12信息流。StreamJob:killJob。。。
流式处理命令失败!
我正在使用Hadoop1.0.4,并用python编写了MapReduce(使用HadoopStreaming) .

错误很明显:

Traceback (most recent call last):
  File "/home/hduser/hduser/hadoop/./red.py", line 30, in <module>
    main()
  File "/home/hduser/hduser/hadoop/./red.py", line 19, in main
    for similarity, group in groupby(data, itemgetter(0), reverse=True):
TypeError: groupby() takes at most 2 arguments (3 given)
回溯(最近一次呼叫最后一次):
文件“/home/hduser/hduser/hadoop//red.py”,第30行,在
main()
文件“/home/hduser/hduser/hadoop//red.py”,第19行,在main中
对于相似性,在groupby中分组(数据,itemgetter(0),reverse=True):
TypeError:groupby()最多接受2个参数(给定3个)

groupby只接受2个参数。这是的文档。

请将代码张贴在问题的正文-代码块中(无粘贴栏)