使用shell脚本的Hadoop流:reducer失败,错误为:没有这样的文件或目录
我使用的是一个10节点的HDP集群,我试图在Bash上使用shell脚本运行一个简单的字数计算作业使用shell脚本的Hadoop流:reducer失败,错误为:没有这样的文件或目录,hadoop,mapreduce,hadoop2,hadoop-streaming,Hadoop,Mapreduce,Hadoop2,Hadoop Streaming,我使用的是一个10节点的HDP集群,我试图在Bash上使用shell脚本运行一个简单的字数计算作业 yarn jar /usr/hdp/2.6.5.0-292/hadoop-mapreduce/hadoop-streaming-2.7.3.2.6.5.0-292.jar \ -mapper 'wc -l' \ -reducer './reducer_wordcount.sh' \ -file /home/pathirippilly/map_reduce_jobs/
yarn jar /usr/hdp/2.6.5.0-292/hadoop-mapreduce/hadoop-streaming-2.7.3.2.6.5.0-292.jar \
-mapper 'wc -l' \
-reducer './reducer_wordcount.sh' \
-file /home/pathirippilly/map_reduce_jobs/shell_scripts/reducer_wordcount.sh \
-numReduceTasks 1 \
-input /user/pathirippilly/cards/smalldeck.txt \
-output /user/pathirippilly/mapreduce_jobs/output_shell
这里reducer_wordcount.sh是reducer shell脚本,可在
我的本地目录/home/pathirippilly/map\u reduce\u jobs/shell\u scripts
smalldeck.txt是hadoop目录/user/pathirippilly/cards上的输入文件
/user/pathirippilly/mapreduce\u jobs/output\u shell是输出目录
我使用的hadoop版本是hadoop 2.7.3.2.6.5.0-292
我正在纱线模式下运行上面的map reduce作业
reducer\u wordcount.sh具有:
#! /user/bin/env bash
awk '{line_count += $1} END { print line_count }'
当我在我的集群上运行这个程序时,我发现reducer\u wordcount.sh的错误如下
Error: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:112)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:78)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:410)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
... 9 more
Caused by: java.lang.RuntimeException: configuration exception
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222)
at org.apache.hadoop.streaming.PipeReducer.configure(PipeReducer.java:67)
... 14 more
Caused by: java.io.IOException: Cannot run program "/hdp01/hadoop/yarn/local/usercache/pathirippilly/appcache/application_1533622723243_17238/container_e38_1533622723243_17238_01_000004/./reducer_wordcount.sh": error=2, No such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)
... 15 more
Caused by: java.io.IOException: error=2, No such file or directory
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.<init>(UNIXProcess.java:248)
at java.lang.ProcessImpl.start(ProcessImpl.java:134)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
在这里期待帮助,我对hadoop流媒体相当陌生。
完整的错误堆栈如下所示:
18/09/09 10:10:02 WARN streaming.StreamJob: -file option is deprecated, please use generic option -files instead.
packageJobJar: [reducer_wordcount.sh] [/usr/hdp/2.6.5.0-292/hadoop-mapreduce/hadoop-streaming-2.7.3.2.6.5.0-292.jar] /var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir/streamjob8506373101127930734.jar tmpDir=null
18/09/09 10:10:03 INFO client.RMProxy: Connecting to ResourceManager at rm01.itversity.com/172.16.1.106:8050
18/09/09 10:10:03 INFO client.AHSProxy: Connecting to Application History server at rm01.itversity.com/172.16.1.106:10200
18/09/09 10:10:03 INFO client.RMProxy: Connecting to ResourceManager at rm01.itversity.com/172.16.1.106:8050
18/09/09 10:10:03 INFO client.AHSProxy: Connecting to Application History server at rm01.itversity.com/172.16.1.106:10200
18/09/09 10:10:05 INFO mapred.FileInputFormat: Total input paths to process : 1
18/09/09 10:10:06 INFO mapreduce.JobSubmitter: number of splits:2
18/09/09 10:10:07 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1533622723243_17238
18/09/09 10:10:08 INFO impl.YarnClientImpl: Submitted application application_1533622723243_17238
18/09/09 10:10:08 INFO mapreduce.Job: The url to track the job: http://rm01.itversity.com:19288/proxy/application_1533622723243_17238/
18/09/09 10:10:08 INFO mapreduce.Job: Running job: job_1533622723243_17238
18/09/09 10:10:14 INFO mapreduce.Job: Job job_1533622723243_17238 running in uber mode : false
18/09/09 10:10:14 INFO mapreduce.Job: map 0% reduce 0%
18/09/09 10:10:19 INFO mapreduce.Job: map 100% reduce 0%
18/09/09 10:10:23 INFO mapreduce.Job: Task Id : attempt_1533622723243_17238_r_000000_0, Status : FAILED
Error: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:112)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:78)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:410)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
... 9 more
Caused by: java.lang.RuntimeException: configuration exception
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222)
at org.apache.hadoop.streaming.PipeReducer.configure(PipeReducer.java:67)
... 14 more
Caused by: java.io.IOException: Cannot run program "/hdp01/hadoop/yarn/local/usercache/pathirippilly/appcache/application_1533622723243_17238/container_e38_1533622723243_17238_01_000004/./reducer_wordcount.sh": error=2, No such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)
... 15 more
Caused by: java.io.IOException: error=2, No such file or directory
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.<init>(UNIXProcess.java:248)
at java.lang.ProcessImpl.start(ProcessImpl.java:134)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
... 16 more
参考和
基本上,您只需要脚本的文件名,而不需要路径
-reducer 'reducer_wordcount.sh' -file /local/path/to/reducer_wordcount.sh
确保该文件是可执行的
chmod +x /local/path/to/reducer_wordcount.sh
您可以选择使用链接中所示的标记重命名文件,但您的本地脚本名称与reducer文件相同,因此不需要这样做
你还需要修复这个shebang/usr/bin/env bash
顺便说一下,您的映射器和reducer正在做同样的事情,计算行数,不一定是单词数,但我已经提到了-file only`-file/home/pathirippilly/map\u reduce\u jobs/shell\u scripts/reducer\u wordcount.sh`的完整路径。这是因为我是从/home/pathirippilly/提交作业的,但我的reducer作业位于上面提到的路径中。而且这个reducer在集群中不可用,所以只能使用-file将reducer发送到集群,即使我以如下方式执行:`hadoop-jar/usr/hdp/2.6.5.0-292/hadoop-mapreduce/hadoop-streaming-2.7.3.2.6.5.0-292.jar-input/user/pathirpilly/cards/smalldeck.txt-output/user/pathirpilly/mapreduce\u jobs/output\u shell-mapper'wc-l'-reducer'reducer\u wordcount.sh'我仍然得到了同样的错误:`没有这样的文件或目录`我仍然是困惑我哪里出错了-从你刚刚粘贴的内容来看,你没有给出一个-file标志,如果你引用这两个链接,这个标志仍然是必需的。您还需要在脚本上设置可执行位。我没有得到它:。你能给我看一个小例子或者一个完整的论点格式吗,我对这个很陌生,例子在链接中。。。其中一个显示了python脚本,但这在这里并不重要。您是否能够运行文档中显示的示例?在reducer_wordcount.sh中,它是/usr/bin/env not/user/bin/env。
chmod +x /local/path/to/reducer_wordcount.sh