Java 使用MapReduce MultipleOutput清空输出文件

Java 使用MapReduce MultipleOutput清空输出文件,java,hadoop,mapreduce,Java,Hadoop,Mapreduce,我在我的Reducer中使用multipleoutput,因为我希望每个键都有单独的结果文件,但是,每个结果文件都是空的,尽管创建了默认结果文件part-r-xxxx并包含正确的值 这是我的JobDriver和Reducer代码 主类 public static void main(String[] args) throws Exception { int currentIteration = 0; int reducerCount, roundCount; Confi

我在我的Reducer中使用multipleoutput,因为我希望每个键都有单独的结果文件,但是,每个结果文件都是空的,尽管创建了默认结果文件part-r-xxxx并包含正确的值

这是我的JobDriver和Reducer代码

主类

public static void main(String[] args) throws Exception {
    int currentIteration = 0;
    int reducerCount, roundCount;

    Configuration conf = createConfiguration(currentIteration);
    cleanEnvironment(conf);
    Job job = new Job(conf, "cfim");

    //Input and output format configuration
    job.setMapperClass(TransactionsMapper.class);
    job.setReducerClass(PatriciaReducer.class);

    job.setInputFormatClass(TransactionInputFormat.class);
    job.setMapOutputKeyClass(LongWritable.class);
    job.setMapOutputValueClass(Text.class);

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(Text.class);

    reducerCount = roundCount = Math.floorDiv(getRoundCount(conf), Integer.parseInt(conf.get(MRConstants.mergeFactorSpecifier)));

    FileInputFormat.addInputPath(job, new Path("/home/cloudera/datasets/input"));
    Path outputPath = new Path(String.format(MRConstants.outputPathFormat, outputDir, currentIteration));
    FileOutputFormat.setOutputPath(job, outputPath);
    MultipleOutputs.addNamedOutput(job, "key", TextOutputFormat.class, LongWritable.class, Text.class);

    job.waitForCompletion(true);
减速器类

public class PatriciaReducer extends Reducer<LongWritable, Text, LongWritable, Text> {

private ITreeManager treeManager;
private SerializationManager serializationManager;
private MultipleOutputs<LongWritable, Text> mos;

@Override 
protected void setup(Context context) throws IOException ,InterruptedException {
    treeManager = new PatriciaTreeManager();
    serializationManager = new SerializationManager();
    mos = new MultipleOutputs<LongWritable, Text>(context);
}

@Override
protected void reduce(LongWritable key, Iterable<Text> items, Context context)
        throws IOException, InterruptedException {

    Iterator<Text> patriciaIterator = items.iterator();
    PatriciaTree tree = new PatriciaTree();

    if (patriciaIterator.hasNext()){
        Text input = patriciaIterator.next();
        tree = serializationManager.deserializePatriciaTree(input.toString());
    }

    while(patriciaIterator.hasNext()){
        Text input = patriciaIterator.next();
        PatriciaTree mergeableTree = serializationManager.deserializePatriciaTree(input.toString());
        tree = treeManager.mergeTree(tree, mergeableTree, false);
    }

    Text outputValue = new Text(serializationManager.serializeAsJson(tree));
    mos.write("key", key, outputValue, generateOutputPath(key));
    context.write(key, outputValue);
}

@Override
protected void finalize() throws Throwable {
    // TODO Auto-generated method stub
    super.finalize();
    mos.close();
}

private String generateOutputPath(LongWritable key) throws IOException {
    String outputPath = String.format("%s-%s", MRConstants.reduceResultValue, key.toString());
    return outputPath;
}   
公共类PatriciaReducer扩展Reducer{
私人IT经理树管理员;
私有序列化管理器序列化管理器;
私人多路输出mos;
@凌驾
受保护的无效设置(上下文上下文)引发IOException、InterruptedException{
treeManager=新的PatriciaTreeManager();
serializationManager=新的serializationManager();
mos=新的多输出(上下文);
}
@凌驾
受保护的void reduce(LongWritable键、Iterable项、上下文)
抛出IOException、InterruptedException{
迭代器patriciaIterator=items.Iterator();
PatriciaTree=新的PatriciaTree();
if(patriciaIterator.hasNext()){
文本输入=patriciaIterator.next();
tree=serializationManager.deserializePatriciaTree(input.toString());
}
while(patriciaIterator.hasNext()){
文本输入=patriciaIterator.next();
PatriciaTree mergeableTree=serializationManager.deserializePatriciaTree(input.toString());
tree=treeManager.mergeTree(tree,mergeableTree,false);
}
Text outputValue=新文本(serializationManager.SerializationAsJSON(树));
mos.write(“key”,key,outputValue,generateOutputPath(key));
write(key,outputValue);
}
@凌驾
受保护的void finalize()抛出可丢弃的{
//TODO自动生成的方法存根
super.finalize();
mos.close();
}
私有字符串generateOutputPath(LongWritable密钥)引发IOException{
String outputPath=String.format(“%s-%s”,MRConstants.reduceResultValue,key.toString());
返回输出路径;
}   
}


我做错什么了吗?

我发现我使用了错误的方法来关闭多个输出对象。在使用cleanup方法而不是finalize方法关闭MultipleOutputs之后,一切都很正常

这个问题似乎有点离题了。到目前为止,你是如何解决这个问题的?调试或测试特定场景?嗯,我注意到结果文件是创建的,但是它们是空的,尽管结果不是空的,但是,因为我使用内置的输出格式,我只是尝试在Webeems中查找类似的问题,因为您自己已经找到了解决方案。别忘了把你的答案标为公认的答案。或者考虑删除你的问题,如果它是如此具体,其他人将不会从中受益。