Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/hadoop/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/unix/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Hadoop-Reducer正在等待映射器输入?_Hadoop_Local_Reduce - Fatal编程技术网

Hadoop-Reducer正在等待映射器输入?

Hadoop-Reducer正在等待映射器输入?,hadoop,local,reduce,Hadoop,Local,Reduce,如标题中所述,当我执行Hadoop程序(并在本地模式下调试)时,会发生以下情况: 1。在映射步骤后调用的映射器、分区器和RawComperator(OutputKeyComparatorClass)中,我的测试数据中的所有10条csv行都得到了正确处理。但是OutputValueGroupingComparatorClass和ReduceClass的函数不会在之后执行。 2。我的应用程序如下所示。(由于空间限制,我省略了用作配置参数的类的实现,直到有人有了涉及它们的想法): 3。我得到了以下控制

如标题中所述,当我执行Hadoop程序(并在本地模式下调试)时,会发生以下情况:

1。在映射步骤后调用的映射器、分区器和RawComperator(OutputKeyComparatorClass)中,我的测试数据中的所有10条csv行都得到了正确处理。但是OutputValueGroupingComparatorClass和ReduceClass的函数不会在之后执行。

2。我的应用程序如下所示。(由于空间限制,我省略了用作配置参数的类的实现,直到有人有了涉及它们的想法):

3。我得到了以下控制台输出(对于格式很抱歉,但不知何故,此日志的格式不正确):

12/05/22 03:51:05 INFO mapred.MapTask:io.sort.mb=100 12/05/22 03:51:05信息映射。映射任务:数据缓冲区=79691776/99614720

12/05/22 03:51:05信息映射。映射任务:记录缓冲区=262144/327680

12/05/22 03:51:06信息映射。作业客户端:映射0%减少0%

12/05/22 03:51:11信息映射。本地JobRunner: 文件:/home/ema/INPUT-H/tweets:0+967 12/05/22 03:51:12信息 mapred.JobClient:映射39%减少0%

12/05/22 03:51:14 INFO mapred.LocalJobRunner: 文件:/home/ema/INPUT-H/tweets:0+967 12/05/22 03:51:15信息 mapred.MapTask:开始刷新映射输出

12/05/22 03:51:15信息映射。映射任务:已完成溢出0

12/05/22 03:51:15信息映射。任务:任务:尝试本地 完成了。并且正在提交

12/05/22 03:51:15信息映射。作业客户端:映射79%减少0%

12/05/22 03:51:17信息映射。本地JobRunner: 文件:/home/ema/INPUT-H/tweets:0+967

12/05/22 03:51:17信息映射。本地JobRunner: 文件:/home/ema/INPUT-H/tweets:0+967

12/05/22 03:51:17信息映射。任务:任务 已完成“尝试本地0 0001 m 000000 0”

12/05/22 03:51:17信息映射。任务:使用ResourceCalculatorPlugin: org.apache.hadoop.util。LinuxResourceCalculatorPlugin@35eed0

12/05/22 03:51:17信息映射。还原任务:Shuffler管理器: MemoryLimit=709551680,MaxSingleShuffleLimit=177387920

12/05/22 03:51:17信息映射。还原任务: 尝试\u本地\u 0001\u r\u000000\u 0线程已启动:用于合并的线程 磁盘上的文件

12/05/22 03:51:17信息映射。还原任务: 尝试\u本地\u 0001\u r\u000000\u 0线程等待:用于合并的线程 磁盘上的文件

12/05/22 03:51:17信息映射。还原任务: 尝试\u本地\u 0001\u r\u000000\u 0线程已启动:用于在中合并的线程 内存文件

12/05/22 03:51:17信息映射。还原任务:尝试\u本地\u 0001\u r\u 000000\u 0需要另一个1映射输出,其中0为 已在进行中12/05/22 03:51:17信息映射。还原任务: 尝试\u本地\u 0001\u r\u000000\u 0计划的0个输出(0个慢速主机和0个 dup主机)

12/05/22 03:51:17信息映射。还原任务: 尝试\u本地\u 0001\u r\u000000\u 0线程已启动:轮询映射的线程 完成活动

12/05/22 03:51:18信息映射。作业客户端:映射100%减少0%12/05/22 03:51:23信息映射。本地作业运行程序:减少>复制>

public class RetweetApplication {

    public static int DEBUG = 1;
    static String INPUT = "/home/ema/INPUT-H";
    static String OUTPUT = "/home/ema/OUTPUT-H "+ (new Date()).toString();

    public static void main(String[] args) {
    JobClient client = new JobClient();
    JobConf conf = new JobConf(RetweetApplication.class);


    if(DEBUG > 0){
        conf.set("mapred.job.tracker", "local");
        conf.set("fs.default.name", "file:///");
        conf.set("dfs.replication", "1");
    }


    FileInputFormat.setInputPaths(conf, new Path(INPUT));   
    FileOutputFormat.setOutputPath(conf, new Path(OUTPUT));


    //conf.setOutputKeyClass(Text.class);
    //conf.setOutputValueClass(Text.class);
    conf.setMapOutputKeyClass(Text.class);
    conf.setMapOutputValueClass(Text.class);

    conf.setMapperClass(RetweetMapper.class);
    conf.setPartitionerClass(TweetPartitioner.class);
    conf.setOutputKeyComparatorClass(TwitterValueGroupingComparator.class);
    conf.setOutputValueGroupingComparator(TwitterKeyGroupingComparator.class);
    conf.setReducerClass(RetweetReducer.class);

    conf.setOutputFormat(TextOutputFormat.class);

    client.setConf(conf);
    try {
        JobClient.runJob(conf);
    } catch (Exception e) {
        e.printStackTrace();
    }
    }
}
RetweetApplication (1) [Remote Java Application]    
    OpenJDK Client VM[localhost:5002]   
        Thread [main] (Running) 
        Thread [Thread-2] (Running) 
        Daemon Thread [communication thread] (Running)  
        Thread [MapOutputCopier attempt_local_0001_r_000000_0.0] (Running)  
        Thread [MapOutputCopier attempt_local_0001_r_000000_0.1] (Running)  
        Thread [MapOutputCopier attempt_local_0001_r_000000_0.2] (Running)  
        Thread [MapOutputCopier attempt_local_0001_r_000000_0.4] (Running)  
        Thread [MapOutputCopier attempt_local_0001_r_000000_0.3] (Running)  
        Daemon Thread [Thread for merging on-disk files] (Running)  
        Daemon Thread [Thread for merging in memory files] (Running)    
        Daemon Thread [Thread for polling Map Completion Events] (Running)  
粗体标记的线条从此处不断重复

4。映射程序看到每个元组后,许多打开的进程都处于活动状态:

是否有任何原因,为什么Hadoop期望从映射器(见日志中粗体标记的行)获得比输入目录更多的输出?如前所述,我调试了mapper/partitioner/etc中的所有输入是否正确处理

更新 在Chris的帮助下(参见注释),我发现我的程序并没有像我预期的那样以localMode启动:
ReduceTask
类中的
isLocal
变量设置为
false
,尽管它应该是
true

我完全不清楚为什么会发生这种情况,因为启用独立模式必须设置的3个选项设置正确。令人惊讶的是:虽然忽略了
local
设置,“从普通光盘读取”设置却没有,这在我看来非常奇怪,因为我认为
local
模式和
文件://
协议是耦合的

在调试过程中,通过在调试视图中计算
isLocal=true
,我将
isLocal
变量设置为true,然后尝试执行程序的其余部分。它不起作用,这是stacktrace:

12/05/22 14:28:28 INFO mapred.LocalJobRunner: 
12/05/22 14:28:28 INFO mapred.Merger: Merging 1 sorted segments
12/05/22 14:28:28 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 1956 bytes
12/05/22 14:28:28 INFO mapred.LocalJobRunner: 
12/05/22 14:28:29 WARN conf.Configuration: file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a attempt to override final parameter: fs.default.name;  Ignoring.
12/05/22 14:28:29 WARN conf.Configuration: file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a attempt to override final parameter: mapred.job.tracker;  Ignoring.
12/05/22 14:28:30 INFO ipc.Client: Retrying connect to server: master/127.0.0.1:9001. Already tried 0 time(s).
12/05/22 14:28:31 INFO ipc.Client: Retrying connect to server: master/127.0.0.1:9001. Already tried 1 time(s).
12/05/22 14:28:32 INFO ipc.Client: Retrying connect to server: master/127.0.0.1:9001. Already tried 2 time(s).
12/05/22 14:28:33 INFO ipc.Client: Retrying connect to server: master/127.0.0.1:9001. Already tried 3 time(s).
12/05/22 14:28:34 INFO ipc.Client: Retrying connect to server: master/127.0.0.1:9001. Already tried 4 time(s).
12/05/22 14:28:35 INFO ipc.Client: Retrying connect to server: master/127.0.0.1:9001. Already tried 5 time(s).
12/05/22 14:28:36 INFO ipc.Client: Retrying connect to server: master/127.0.0.1:9001. Already tried 6 time(s).
12/05/22 14:28:37 INFO ipc.Client: Retrying connect to server: master/127.0.0.1:9001. Already tried 7 time(s).
12/05/22 14:28:38 INFO ipc.Client: Retrying connect to server: master/127.0.0.1:9001. Already tried 8 time(s).
12/05/22 14:28:39 INFO ipc.Client: Retrying connect to server: master/127.0.0.1:9001. Already tried 9 time(s).
12/05/22 14:28:39 WARN conf.Configuration: file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a attempt to override final parameter: fs.default.name;  Ignoring.
12/05/22 14:28:39 WARN conf.Configuration: file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a attempt to override final parameter: mapred.job.tracker;  Ignoring.
12/05/22 14:28:39 WARN mapred.LocalJobRunner: job_local_0001
java.net.ConnectException: Call to master/127.0.0.1:9001 failed on connection exception: java.net.ConnectException: Connection refused
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:1095)
    at org.apache.hadoop.ipc.Client.call(Client.java:1071)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
    at $Proxy1.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
    at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)
    at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:123)
    at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.<init>(ReduceTask.java:446)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:490)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
Caused by: java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:489)
    at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:434)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:560)
    at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:184)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1202)
    at org.apache.hadoop.ipc.Client.call(Client.java:1046)
    ... 17 more
12/05/22 14:28:39 WARN conf.Configuration: file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a attempt to override final parameter: fs.default.name;  Ignoring.
12/05/22 14:28:39 WARN conf.Configuration: file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a attempt to override final parameter: mapred.job.tracker;  Ignoring.
12/05/22 14:28:39 INFO mapred.JobClient: Job complete: job_local_0001
12/05/22 14:28:39 INFO mapred.JobClient: Counters: 20
12/05/22 14:28:39 INFO mapred.JobClient:   File Input Format Counters 
12/05/22 14:28:39 INFO mapred.JobClient:     Bytes Read=967
12/05/22 14:28:39 INFO mapred.JobClient:   FileSystemCounters
12/05/22 14:28:39 INFO mapred.JobClient:     FILE_BYTES_READ=14093
12/05/22 14:28:39 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=47859
12/05/22 14:28:39 INFO mapred.JobClient:   Map-Reduce Framework
12/05/22 14:28:39 INFO mapred.JobClient:     Map output materialized bytes=1960
12/05/22 14:28:39 INFO mapred.JobClient:     Map input records=10
12/05/22 14:28:39 INFO mapred.JobClient:     Reduce shuffle bytes=0
12/05/22 14:28:39 INFO mapred.JobClient:     Spilled Records=10
12/05/22 14:28:39 INFO mapred.JobClient:     Map output bytes=1934
12/05/22 14:28:39 INFO mapred.JobClient:     Total committed heap usage (bytes)=115937280
12/05/22 14:28:39 INFO mapred.JobClient:     CPU time spent (ms)=0
12/05/22 14:28:39 INFO mapred.JobClient:     Map input bytes=967
12/05/22 14:28:39 INFO mapred.JobClient:     SPLIT_RAW_BYTES=82
12/05/22 14:28:39 INFO mapred.JobClient:     Combine input records=0
12/05/22 14:28:39 INFO mapred.JobClient:     Reduce input records=0
12/05/22 14:28:39 INFO mapred.JobClient:     Reduce input groups=0
12/05/22 14:28:39 INFO mapred.JobClient:     Combine output records=0
12/05/22 14:28:39 INFO mapred.JobClient:     Physical memory (bytes) snapshot=0
12/05/22 14:28:39 INFO mapred.JobClient:     Reduce output records=0
12/05/22 14:28:39 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=0
12/05/22 14:28:39 INFO mapred.JobClient:     Map output records=10
12/05/22 14:28:39 INFO mapred.JobClient: Job Failed: NA
java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265)
    at uni.kassel.macek.rtprep.RetweetApplication.main(RetweetApplication.java:50)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:616)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
12/05/22 14:28:28信息映射。本地JobRunner:
12/05/22 14:28:28信息映射。合并:合并1个已排序的段
12/05/22 14:28:28信息映射。合并:向下到最后一个合并过程,总大小剩下1段:1956字节
12/05/22 14:28:28信息映射。本地JobRunner:
12/05/22 14:28:29警告配置:文件:/tmp/hadoop ema/mapred/local/localRunner/job_local_0001.xml:试图覆盖最终参数:fs.default.name;忽略。
12/05/22 14:28:29警告配置:文件:/tmp/hadoop ema/mapred/local/localRunner/job_local_0001.xml:试图覆盖最终参数:mapred.job.tracker;忽略。
12/05/22 14:28:30信息ipc.客户端:重试连接到服务器:master/127.0.0.1:9001。已尝试了0次。
12/05/22 14:28:31信息ipc.客户端:重试连接到服务器:master/127.0.0.1:9001。已尝试1次。
12/05/22 14:28:32信息ipc.客户端:正在重试连接到服务器:master/127.0.0.1:9001。已尝试了2次。
12/05/22 14:28:33信息ipc.客户端:重试连接到服务器:master/127.0.0.1:9001。已尝试了3次。
12/05/22 14:28:34信息ipc.客户端:正在重试连接到服务器:master/127.0.0.1:9001。已尝试了4次。
12/05/22 14:28:35信息ipc.客户端:重试连接到服务器:master/127.0.0.1:9001。已尝试了5次。
12/05/22 14:28:36信息ipc.客户端:重试连接到服务器:master/127.0.0.1:9001。艾尔
12/05/22 14:28:29 WARN conf.Configuration: file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a attempt to override final parameter: fs.default.name;  Ignoring.
12/05/22 14:28:29 WARN conf.Configuration: file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a attempt to override final parameter: mapred.job.tracker;  Ignoring.