R 流式处理命令失败!狂想

R 流式处理命令失败!狂想,r,rhadoop,R,Rhadoop,我已经在Hortonwork VM中安装了RHADOOP。当我运行mapreduce代码来验证它是否抛出了一个错误 我使用user as:rstudio(不是root,但可以访问sudoer) 流式处理命令失败 有人能帮我理解这个问题吗?我对解决thios问题没有多少想法 Sys.setenv(HADOOP_HOME="/usr/hdp/2.2.0.0-2041/hadoop") Sys.setenv(HADOOP_CMD="/usr/bin/hadoop") Sys.sete

我已经在Hortonwork VM中安装了RHADOOP。当我运行mapreduce代码来验证它是否抛出了一个错误

我使用user as:rstudio(不是root,但可以访问sudoer)

流式处理命令失败

有人能帮我理解这个问题吗?我对解决thios问题没有多少想法

Sys.setenv(HADOOP_HOME="/usr/hdp/2.2.0.0-2041/hadoop")

    Sys.setenv(HADOOP_CMD="/usr/bin/hadoop")
    Sys.setenv(HADOOP_STREAMING="/usr/hdp/2.2.0.0-2041/hadoop-mapreduce/hadoop-streaming.jar")
    library(rhdfs)
    hdfs.init()
    library(rmr2)
    ints = to.dfs(1:10)
    calc = mapreduce(input = ints, map = function(k, v) cbind(v, 2*v)

)
我得到了错误,下面是rhadoop中的错误

Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce, : hadoop streaming failed with error code 1

4
stop("hadoop streaming failed with error code ", retval, "\n")
3
mr(map = map, reduce = reduce, combine = combine, vectorized.reduce, in.folder = if (is.list(input)) { lapply(input, to.dfs.path) } else to.dfs.path(input), out.folder = to.dfs.path(output), ...
2
mapreduce(input = input, output = output, input.format = "text", map = map)
1
wordcount(hdfs.data, hdfs.out)



packageJobJar: [] [/usr/hdp/2.2.0.0-2041/hadoop-mapreduce/hadoop-streaming-2.6.0.2.2.0.0-2041.jar] /tmp/streamjob3075733686753367992.jar tmpDir=null
15/04/07 21:43:10 INFO impl.TimelineClientImpl: Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/
15/04/07 21:43:10 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8050
15/04/07 21:43:11 INFO impl.TimelineClientImpl: Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/
15/04/07 21:43:11 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8050
15/04/07 21:43:11 INFO mapred.FileInputFormat: Total input paths to process : 1
15/04/07 21:43:11 INFO mapreduce.JobSubmitter: number of splits:2
15/04/07 21:43:12 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1428440418649_0006
15/04/07 21:43:12 INFO impl.YarnClientImpl: Submitted application application_1428440418649_0006
15/04/07 21:43:12 INFO mapreduce.Job: The url to track the job: http://sandbox.hortonworks.com:8088/proxy/application_1428440418649_0006/
15/04/07 21:43:12 INFO mapreduce.Job: Running job: job_1428440418649_0006
15/04/07 21:43:19 INFO mapreduce.Job: Job job_1428440418649_0006 running in uber mode : false
15/04/07 21:43:19 INFO mapreduce.Job:  map 0% reduce 0%
15/04/07 21:43:27 INFO mapreduce.Job: Task Id : attempt_1428440418649_0006_m_000001_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143

15/04/07 21:43:27 INFO mapreduce.Job: Task Id : attempt_1428440418649_0006_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

15/04/07 21:43:35 INFO mapreduce.Job: Task Id : attempt_1428440418649_0006_m_000001_1, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

15/04/07 21:43:35 INFO mapreduce.Job: Task Id : attempt_1428440418649_0006_m_000000_1, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

15/04/07 21:43:43 INFO mapreduce.Job: Task Id : attempt_1428440418649_0006_m_000001_2, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

15/04/07 21:43:44 INFO mapreduce.Job: Task Id : attempt_1428440418649_0006_m_000000_2, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

15/04/07 21:43:52 INFO mapreduce.Job:  map 100% reduce 0%
15/04/07 21:43:53 INFO mapreduce.Job: Job job_1428440418649_0006 failed with state FAILED due to: Task failed task_1428440418649_0006_m_000001
Job failed as tasks failed. failedMaps:1 failedReduces:0

15/04/07 21:43:54 INFO mapreduce.Job: Counters: 13
    Job Counters 
        Failed map tasks=7
        Killed map tasks=1
        Launched map tasks=8
        Other local map tasks=6
        Data-local map tasks=2
        Total time spent by all maps in occupied slots (ms)=49670
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=49670
        Total vcore-seconds taken by all map tasks=49670
        Total megabyte-seconds taken by all map tasks=12417500
    Map-Reduce Framework
        CPU time spent (ms)=0
        Physical memory (bytes) snapshot=0
        Virtual memory (bytes) snapshot=0
15/04/07 21:43:54 ERROR streaming.StreamJob: Job not successful!
Streaming Command Failed!
Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce,  : 
  hadoop streaming failed with error code 1

您的代码在更改
HADOOP\u CMD
HADOOP\u STREAMING
以匹配我的系统配置方面对我来说效果很好(我在Ubuntu 14.04上运行HADOOP 2.4.0)

我的建议是:

  • 确保hadoop的功能实例正在运行,即终端上的命令
    jps
    应显示以下输出:

  • 确保在加载库(rhdfs)时加载rJava库
  • 确保您引用的是正确的流式jar文件
下面是R代码和输出:

Sys.setenv("HADOOP_CMD"="/usr/local/hadoop/bin/hadoop")
Sys.setenv("HADOOP_STREAMING"="/usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.4.0.jar")

library(rhdfs)
# Loading required package: rJava
# HADOOP_CMD=/usr/local/hadoop/bin/hadoop
# Be sure to run hdfs.init()

hdfs.init()
library(rmr2)
ints = to.dfs(1:10)
calc = mapreduce(input = ints, map = function(k, v) cbind(v, 2*v))
输出:

15/04/07 05:18:44 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/04/07 05:18:45 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
packageJobJar: [/usr/local/hadoop/data/hadoop-unjar1328285833881826794/] [] /tmp/    streamjob6167004817219806828.jar tmpDir=null
15/04/07 05:18:47 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8050
15/04/07 05:18:47 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8050
15/04/07 05:18:48 INFO mapred.FileInputFormat: Total input paths to process : 1
15/04/07 05:18:49 INFO mapreduce.JobSubmitter: number of splits:2
15/04/07 05:18:49 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1428363713092_0002
15/04/07 05:18:49 INFO impl.YarnClientImpl: Submitted application application_1428363713092_0002
15/04/07 05:18:50 INFO mapreduce.Job: The url to track the job: http://manohar-dt:8088/proxy/application_1428363713092_0002/
15/04/07 05:18:50 INFO mapreduce.Job: Running job: job_1428363713092_0002
15/04/07 05:19:00 INFO mapreduce.Job: Job job_1428363713092_0002 running in uber mode : false
15/04/07 05:19:00 INFO mapreduce.Job:  map 0% reduce 0%
15/04/07 05:19:15 INFO mapreduce.Job:  map 50% reduce 0%
15/04/07 05:19:16 INFO mapreduce.Job:  map 100% reduce 0%
15/04/07 05:19:17 INFO mapreduce.Job: Job job_1428363713092_0002 completed successfully
15/04/07 05:19:17 INFO mapreduce.Job: Counters: 30
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=194356
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=979
        HDFS: Number of bytes written=919
        HDFS: Number of read operations=14
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=4
    Job Counters 
        Launched map tasks=2
        Data-local map tasks=2
    Total time spent by all maps in occupied slots (ms)=25803
    Total time spent by all reduces in occupied slots (ms)=0
    Total time spent by all map tasks (ms)=25803
    Total vcore-seconds taken by all map tasks=25803
    Total megabyte-seconds taken by all map tasks=26422272
    Map-Reduce Framework
    Map input records=3
    Map output records=3
    Input split bytes=186
    Spilled Records=0
    Failed Shuffles=0
    Merged Map outputs=0
    GC time elapsed (ms)=293
    CPU time spent (ms)=3640
    Physical memory (bytes) snapshot=322818048
    Virtual memory (bytes) snapshot=2107604992
    Total committed heap usage (bytes)=223346688
    File Input Format Counters 
    Bytes Read=793
    File Output Format Counters 
        Bytes Written=919
15/04/07 05:19:17 INFO streaming.StreamJob: Output directory: /tmp/file11d247219866

希望这能有所帮助。

您当前的实现使用的是Rstudio。您可以尝试在.R中编写代码,并使用hadoop jar$hadoop\u HOME/contrib/streaming/hadoop-streaming.jar-input file-input-in-hadoop-output-hdfs\u output\u dir-file-mapper\u file-file-reducer\u-mapper-mapper.R运行该代码吗

顺便说一下,只有在没有指定正确的输入/输出路径时,才会导致异常
PipeMapRed.waitOutputThreads():
。请检查您的路径


这应该是可行的。

HI Manohar…..即使我写了同样的东西..问题无法运行..这就是我无法解决的问题..我尝试了许多组合来解决这个问题…我知道代码没有问题..这个答案对我没有任何帮助我正在使用hortonwork,我认为hadoop\u cmd和hadoop\u流媒体的路径是正确..除此之外,我看不到任何其他问题…嗨,阿曼,可以粘贴错误输出的全文吗?