Java 使用不带context.write的多输出结果空文件
我不知道如何使用MultipleOutputs类。我用它来创建多个输出文件。下面是我的驱动程序类的代码片段Java 使用不带context.write的多输出结果空文件,java,hadoop,mapreduce,Java,Hadoop,Mapreduce,我不知道如何使用MultipleOutputs类。我用它来创建多个输出文件。下面是我的驱动程序类的代码片段 Configuration conf = new Configuration(); Job job = Job.getInstance(conf); job.setJarByClass(CustomKeyValueTest.class);//class with mapper and reducer job.setOutputKeyClass(Custom
Configuration conf = new Configuration();
Job job = Job.getInstance(conf);
job.setJarByClass(CustomKeyValueTest.class);//class with mapper and reducer
job.setOutputKeyClass(CustomKey.class);
job.setOutputValueClass(Text.class);
job.setMapOutputKeyClass(CustomKey.class);
job.setMapOutputValueClass(CustomValue.class);
job.setMapperClass(CustomKeyValueTestMapper.class);
job.setReducerClass(CustomKeyValueTestReducer.class);
job.setInputFormatClass(TextInputFormat.class);
Path in = new Path(args[1]);
Path out = new Path(args[2]);
out.getFileSystem(conf).delete(out, true);
FileInputFormat.setInputPaths(job, in);
FileOutputFormat.setOutputPath(job, out);
MultipleOutputs.addNamedOutput(job, "islnd" , TextOutputFormat.class, CustomKey.class, Text.class);
LazyOutputFormat.setOutputFormatClass(job, TextOutputFormat.class);
MultipleOutputs.setCountersEnabled(job, true);
boolean status = job.waitForCompletion(true);
在Reducer中,我使用了这样的倍数输出
private MultipleOutputs<CustomKey, Text> multipleOutputs;
@Override
public void setup(Context context) throws IOException, InterruptedException {
multipleOutputs = new MultipleOutputs<>(context);
}
@Override
public void reduce(CustomKey key, Iterable<CustomValue> values, Context context) throws IOException, InterruptedException {
...
multipleOutputs.write("islnd", key, pop, key.toString());
//context.write(key, pop);
}
public void cleanup() throws IOException, InterruptedException {
multipleOutputs.close();
}
专用多路输出多路输出;
@凌驾
公共无效设置(上下文上下文)引发IOException、InterruptedException{
multipleoutput=新的multipleoutput(上下文);
}
@凌驾
public void reduce(CustomKey键、Iterable值、上下文上下文)抛出IOException、InterruptedException{
...
multipleOutputs.write(“islnd”,key,pop,key.toString());
//context.write(key,pop);
}
public void cleanup()引发IOException、InterruptedException{
multipleoutput.close();
}
}
当我使用context.write时,我会得到包含数据的输出文件。但是当我删除context.write时,输出文件是空的。但我不想调用context.write,因为它会创建额外的文件part-r-00000。如上所述(类描述的最后一段),我使用LazyOutputFormat来避免part-r-00000文件。但是仍然不起作用。LazyOutputFormat.setOutputFormatClass(作业,TextOutputFormat.class) 这意味着,如果您不创建任何输出,请不要创建空文件
Can you please look at hadoop counters and find
1. map.output.records
2. reduce.input.groups
3. reduce.input.records to verify if your mappers are sending any data to mapper.
用于多输出的代码是