Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/hadoop/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Hadoop mapreduce还原程序大小错误_Hadoop_Mapreduce - Fatal编程技术网

Hadoop mapreduce还原程序大小错误

Hadoop mapreduce还原程序大小错误,hadoop,mapreduce,Hadoop,Mapreduce,我正在编写一个简单的MapReduce程序,用于计算每行在输入中出现的次数。我的目标是检查两个目录是否包含相同的数据。因此,在reduce阶段,我的目标是检查每个键是否恰好出现两次(每个输入目录中出现一个键) 这是我的密码- public class ResultsValidator extends Configured implements Tool { public static class TuplesScanner extends Mapper<BytesWritable

我正在编写一个简单的MapReduce程序,用于计算每行在输入中出现的次数。我的目标是检查两个目录是否包含相同的数据。因此,在reduce阶段,我的目标是检查每个键是否恰好出现两次(每个输入目录中出现一个键)

这是我的密码-

public class ResultsValidator extends Configured implements Tool {

    public static class TuplesScanner extends Mapper<BytesWritable, NullWritable, BytesWritable, LongWritable> {

        private LongWritable one = new LongWritable(1);

        @Override
        public void map(BytesWritable row, NullWritable ignored, Context context) throws IOException, InterruptedException {
            context.write(row, one);
        }
    }

    public static class TuplesCombiner extends Reducer<BytesWritable, LongWritable, BytesWritable, LongWritable> {

        @Override 
        public void reduce(BytesWritable row, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException {
            int sum = 0;
            for (LongWritable value : values) {
                sum += value.get();
            }
            context.write(row, new LongWritable(sum));
        }
    }

    public static class TuplesReducer extends Reducer<BytesWritable, LongWritable, BytesWritable, NullWritable> {

        @Override 
        public void reduce(BytesWritable row, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException {
            int sum = 0;
            for (LongWritable value : values) {
                sum += value.get();
            }
            if (sum != 2) {
                context.write(row, NullWritable.get());
            }
        }
    }

    public int run(String[] args) throws IOException, InterruptedException, ClassNotFoundException {
        Job job = Job.getInstance(getConf());

        Path inputDir0 = new Path(args[0]);
        Path inputDir1 = new Path(args[1]);
        Path outputDir = new Path(args[2]);
        int reducersNum = Integer.parseInt(args[3]);
        if (outputDir.getFileSystem(getConf()).exists(outputDir)) {
          throw new IOException("Output directory " + outputDir + 
                                " already exists.");
        }
        FileInputFormat.addInputPath(job, inputDir0);
        FileInputFormat.addInputPath(job, inputDir1);
        FileOutputFormat.setOutputPath(job, outputDir);
        job.setJobName("ResultsValidator");
        job.setJarByClass(ResultsValidator.class);
        job.setMapperClass(TuplesScanner.class);
        job.setCombinerClass(TuplesCombiner.class);
        job.setReducerClass(TuplesReducer.class);
        job.setNumReduceTasks(reducersNum);
        job.setMapOutputKeyClass(BytesWritable.class);
        job.setMapOutputValueClass(LongWritable.class);
        job.setOutputKeyClass(BytesWritable.class);
        job.setOutputValueClass(NullWritable.class);
        job.setInputFormatClass(ResultsValidatorInputFormat.class);
        job.setOutputFormatClass(ResultsValidatorOutputFormat.class);
        return job.waitForCompletion(true) ? 0 : 1;
    }

     public static void main(String[] args) throws Exception {
        int res = ToolRunner.run(new Configuration(), new ResultsValidator(), args);
        System.exit(res);
    }
}
公共类ResultsValidator扩展配置的实现工具{
公共静态类tuplescanner扩展映射器{
私有LongWritable one=新的LongWritable(1);
@凌驾
公共void映射(BytesWritable行、NullWritable忽略、上下文上下文)引发IOException、InterruptedException{
context.write(第1行);
}
}
公共静态类TuplesCombiner扩展了Reducer{
@凌驾
公共void reduce(BytesWritable行、Iterable值、上下文)抛出IOException、InterruptedException{
整数和=0;
for(可长写值:值){
sum+=value.get();
}
write(行,新的LongWritable(sum));
}
}
公共静态类TuplesReducer扩展了Reducer{
@凌驾
公共void reduce(BytesWritable行、Iterable值、上下文)抛出IOException、InterruptedException{
整数和=0;
for(可长写值:值){
sum+=value.get();
}
如果(总和=2){
write(行,NullWritable.get());
}
}
}
公共int运行(字符串[]args)引发IOException、InterruptedException、ClassNotFoundException{
Job Job=Job.getInstance(getConf());
路径inputDir0=新路径(args[0]);
路径inputDir1=新路径(args[1]);
Path outputDir=新路径(args[2]);
int reducersNum=Integer.parseInt(args[3]);
如果(outputDir.getFileSystem(getConf()).exists(outputDir)){
抛出新IOException(“输出目录”+outputDir+
“已经存在。”);
}
addInputPath(作业,inputDir0);
addInputPath(作业,inputDir1);
setOutputPath(作业,outputDir);
setJobName(“结果验证程序”);
setJarByClass(ResultsValidator.class);
setMapperClass(tupleScanner.class);
setCombinerClass(TuplesCombiner.class);
setReducerClass(TuplesReducer.class);
job.setNumReduceTasks(reducerNum);
job.setMapOutputKeyClass(BytesWritable.class);
job.setMapOutputValueClass(LongWritable.class);
job.setOutputKeyClass(BytesWritable.class);
job.setOutputValueClass(NullWritable.class);
setInputFormatClass(ResultsValidatorInputFormat.class);
setOutputFormatClass(ResultsValidatorOutputFormat.class);
返回作业。waitForCompletion(true)?0:1;
}
公共静态void main(字符串[]args)引发异常{
int res=ToolRunner.run(新配置(),新结果验证程序(),args);
系统退出(res);
}
}
我找不到在reduce阶段的iterable中得到错误数字的原因。在日志中,我发现每个reducer都会得到一个与合并的洗牌次数相等的数字


我哪里错了?

你说的“错”是什么意思?您描述的是正确的行为。我使用此代码比较两个目录,每个目录包含许多文件。我的目标是检测目录是否相同,但标识取决于文件的行。每个目录都包含唯一的行。在每个目录中,没有一行出现超过一次。也就是说,我应该检查第一个目录中的所有行是否都包含在第二个目录中,反之亦然。我绘制了每条线的地图,并在减速器中检查每个键是否出现两次。该程序适用于小的输入,但当我的每个目录都超过100MB时会失效。