Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/315.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
从Python访问Hadoop--java.io.IOException:管道已经结束_Java_Python_Hadoop_Mapreduce_Hadoop Streaming - Fatal编程技术网

从Python访问Hadoop--java.io.IOException:管道已经结束

从Python访问Hadoop--java.io.IOException:管道已经结束,java,python,hadoop,mapreduce,hadoop-streaming,Java,Python,Hadoop,Mapreduce,Hadoop Streaming,我正在尝试使用hadoop在Windows 10中运行mapper reducer作业。我得到以下错误。我到处找了,但找不到解决办法。基本上java.io.IOException:管道已结束 我做了什么? 添加了shebanger线 将-mapper mapper.py更改为-mapper“python” 检查了映射器和reducer代码、所有配置文件(XML文件) 将mapper和reducer添加到hadoop fs中与输入文件相同的文件夹中 上述解决方案都不起作用 2020-10-21 0

我正在尝试使用hadoop在Windows 10中运行mapper reducer作业。我得到以下错误。我到处找了,但找不到解决办法。基本上
java.io.IOException:管道已结束

我做了什么?
  • 添加了shebanger线
  • -mapper mapper.py
    更改为
    -mapper“python”
  • 检查了映射器和reducer代码、所有配置文件(XML文件)
  • 将mapper和reducer添加到hadoop fs中与输入文件相同的文件夹中
  • 上述解决方案都不起作用

    2020-10-21 01:52:36,861 INFO impl.MetricsConfig: Loaded properties from hadoop-metrics2.properties
    2020-10-21 01:52:37,002 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
    2020-10-21 01:52:37,002 INFO impl.MetricsSystemImpl: JobTracker metrics system started
    2020-10-21 01:52:37,028 WARN impl.MetricsSystemImpl: JobTracker metrics system already initialized!
    2020-10-21 01:52:37,964 INFO mapred.FileInputFormat: Total input files to process : 1
    2020-10-21 01:52:38,209 INFO mapreduce.JobSubmitter: number of splits:1
    2020-10-21 01:52:38,386 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local105213688_0001
    2020-10-21 01:52:38,387 INFO mapreduce.JobSubmitter: Executing with tokens: []
    2020-10-21 01:52:38,557 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
    2020-10-21 01:52:38,561 INFO mapreduce.Job: Running job: job_local105213688_0001
    2020-10-21 01:52:38,561 INFO mapred.LocalJobRunner: OutputCommitter set in config null
    2020-10-21 01:52:38,566 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter
    2020-10-21 01:52:38,579 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
    2020-10-21 01:52:38,579 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
    2020-10-21 01:52:38,689 INFO mapred.LocalJobRunner: Waiting for map tasks
    2020-10-21 01:52:38,694 INFO mapred.LocalJobRunner: Starting task: attempt_local105213688_0001_m_000000_0
    2020-10-21 01:52:38,732 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
    2020-10-21 01:52:38,733 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
    2020-10-21 01:52:38,745 INFO util.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on Linux.
    2020-10-21 01:52:38,801 INFO mapred.Task:  Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@6b965617
    2020-10-21 01:52:38,820 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/inp/inputfile.txt:0+6488666
    2020-10-21 01:52:38,883 INFO mapred.MapTask: numReduceTasks: 1
    2020-10-21 01:52:39,001 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
    2020-10-21 01:52:39,001 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
    2020-10-21 01:52:39,003 INFO mapred.MapTask: soft limit at 83886080
    2020-10-21 01:52:39,004 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
    2020-10-21 01:52:39,005 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
    2020-10-21 01:52:39,012 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
    2020-10-21 01:52:39,115 INFO streaming.PipeMapRed: PipeMapRed exec [D:\Wireshark\uninstall.exe, D:\ll\sem5\Cloud_Computing\assignment\lab6-7\lab7\mapper.py]
    2020-10-21 01:52:39,127 INFO Configuration.deprecation: mapred.work.output.dir is deprecated. Instead, use mapreduce.task.output.dir
    2020-10-21 01:52:39,129 INFO Configuration.deprecation: map.input.start is deprecated. Instead, use mapreduce.map.input.start
    2020-10-21 01:52:39,130 INFO Configuration.deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
    2020-10-21 01:52:39,131 INFO Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
    2020-10-21 01:52:39,133 INFO Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
    2020-10-21 01:52:39,137 INFO Configuration.deprecation: mapred.local.dir is deprecated. Instead, use mapreduce.cluster.local.dir
    2020-10-21 01:52:39,138 INFO Configuration.deprecation: map.input.file is deprecated. Instead, use mapreduce.map.input.file
    2020-10-21 01:52:39,140 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
    2020-10-21 01:52:39,141 INFO Configuration.deprecation: map.input.length is deprecated. Instead, use mapreduce.map.input.length
    2020-10-21 01:52:39,143 INFO Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
    2020-10-21 01:52:39,144 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
    2020-10-21 01:52:39,147 INFO Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
    2020-10-21 01:52:39,470 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
    2020-10-21 01:52:39,471 INFO streaming.PipeMapRed: R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s]
    2020-10-21 01:52:39,474 INFO streaming.PipeMapRed: R/W/S=100/0/0 in:NA [rec/s] out:NA [rec/s]
    2020-10-21 01:52:39,481 INFO streaming.PipeMapRed: R/W/S=1000/0/0 in:NA [rec/s] out:NA [rec/s]
    2020-10-21 01:52:39,584 INFO mapreduce.Job: Job job_local105213688_0001 running in uber mode : false
    2020-10-21 01:52:39,586 INFO mapreduce.Job:  map 0% reduce 0%
    2020-10-21 01:52:40,423 INFO streaming.PipeMapRed: MRErrorThread done
    2020-10-21 01:52:40,425 INFO streaming.PipeMapRed: R/W/S=1299/0/0 in:1299=1299/1 [rec/s] out:0=0/1 [rec/s]
    minRecWrittenToEnableSkip_=9223372036854775807 HOST=null
    USER=null
    HADOOP_USER=null
    last tool output: |null|
    
    java.io.IOException: The pipe has been ended
           at java.base/java.io.FileOutputStream.writeBytes(Native Method)
           at java.base/java.io.FileOutputStream.write(FileOutputStream.java:348)
           at java.base/java.io.BufferedOutputStream.write(BufferedOutputStream.java:123)
           at java.base/java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:81)
           at java.base/java.io.BufferedOutputStream.write(BufferedOutputStream.java:127)
           at java.base/java.io.DataOutputStream.write(DataOutputStream.java:107)
           at org.apache.hadoop.streaming.io.TextInputWriter.writeUTF8(TextInputWriter.java:72)
           at org.apache.hadoop.streaming.io.TextInputWriter.writeValue(TextInputWriter.java:51)
           at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:106)
           at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
           at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
           at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
           at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
           at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271)
           at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
           at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
           at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
           at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
           at java.base/java.lang.Thread.run(Thread.java:835)
    2020-10-21 01:52:40,431 WARN streaming.PipeMapRed: {}
    java.io.IOException: The pipe is being closed
           at java.base/java.io.FileOutputStream.writeBytes(Native Method)
           at java.base/java.io.FileOutputStream.write(FileOutputStream.java:348)
           at java.base/java.io.BufferedOutputStream.write(BufferedOutputStream.java:123)
           at java.base/java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:81)
           at java.base/java.io.BufferedOutputStream.flush(BufferedOutputStream.java:142)
           at java.base/java.io.DataOutputStream.flush(DataOutputStream.java:123)
           at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:532)
           at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:120)
           at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
           at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
           at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
           at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
           at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271)
           at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
           at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
           at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
           at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
           at java.base/java.lang.Thread.run(Thread.java:835)
    2020-10-21 01:52:40,433 INFO streaming.PipeMapRed: mapRedFinished
    2020-10-21 01:52:40,435 WARN streaming.PipeMapRed: {}
    java.io.IOException: The pipe is being closed
           at java.base/java.io.FileOutputStream.writeBytes(Native Method)
           at java.base/java.io.FileOutputStream.write(FileOutputStream.java:348)
           at java.base/java.io.BufferedOutputStream.write(BufferedOutputStream.java:123)
           at java.base/java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:81)
           at java.base/java.io.BufferedOutputStream.flush(BufferedOutputStream.java:142)
           at java.base/java.io.DataOutputStream.flush(DataOutputStream.java:123)
           at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:532)
           at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
           at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
           at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
           at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
           at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
           at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271)
           at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
           at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
           at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
           at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
           at java.base/java.lang.Thread.run(Thread.java:835)
    2020-10-21 01:52:40,438 INFO streaming.PipeMapRed: mapRedFinished
    2020-10-21 01:52:40,445 INFO mapred.LocalJobRunner: map task executor complete.
    2020-10-21 01:52:40,506 WARN mapred.LocalJobRunner: job_local105213688_0001
    java.lang.Exception: java.io.IOException: The pipe has been ended
           at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:492)
           at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:552)
    Caused by: java.io.IOException: The pipe has been ended
           at java.base/java.io.FileOutputStream.writeBytes(Native Method)
           at java.base/java.io.FileOutputStream.write(FileOutputStream.java:348)
           at java.base/java.io.BufferedOutputStream.write(BufferedOutputStream.java:123)
           at java.base/java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:81)
           at java.base/java.io.BufferedOutputStream.write(BufferedOutputStream.java:127)
           at java.base/java.io.DataOutputStream.write(DataOutputStream.java:107)
           at org.apache.hadoop.streaming.io.TextInputWriter.writeUTF8(TextInputWriter.java:72)
           at org.apache.hadoop.streaming.io.TextInputWriter.writeValue(TextInputWriter.java:51)
           at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:106)
           at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
           at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
           at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
           at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
           at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271)
           at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
           at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
           at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
           at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
           at java.base/java.lang.Thread.run(Thread.java:835)
    2020-10-21 01:52:40,596 INFO mapreduce.Job: Job job_local105213688_0001 failed with state FAILED due to: NA
    2020-10-21 01:52:40,625 INFO mapreduce.Job: Counters: 0
    2020-10-21 01:52:40,625 ERROR streaming.StreamJob: Job not successful!
    Streaming Command Failed!
    
    输入文件 我的Python代码 制图员

    #!/usr/bin/python
    
    import sys
    d={}
    for line in sys.stdin:
        line= line.strip()
        words=line.split()
        for word in words:
    
    减速器

    import sys
    cur_word= None
    cur_cnt=0
    
    for line in sys.stdin:
        line=line.strip().split(',')
        # print(line)
        word, count= line[0], (int)(line[1])
    
        if cur_word==None:
            cur_word=word
            cur_cnt=count
        elif cur_word==word:
            cur_cnt+=count
        else:
            print(cur_word,',', cur_cnt)
            cur_word=word
            cur_cnt=count
    print(cur_word,",",count)
    
    

    我已经做了所有的事情,但问题仍然存在。

    您的地图绘制程序似乎不完整:它缺少
    print word

    import sys
    cur_word= None
    cur_cnt=0
    
    for line in sys.stdin:
        line=line.strip().split(',')
        # print(line)
        word, count= line[0], (int)(line[1])
    
        if cur_word==None:
            cur_word=word
            cur_cnt=count
        elif cur_word==word:
            cur_cnt+=count
        else:
            print(cur_word,',', cur_cnt)
            cur_word=word
            cur_cnt=count
    print(cur_word,",",count)