Hadoop 如何在mapreduce中从reducer输出中删除r-00000扩展_Hadoop_Mapreduce_Hadoop2

Hadoop 如何在mapreduce中从reducer输出中删除r-00000扩展

hadoop mapreduce

Hadoop 如何在mapreduce中从reducer输出中删除r-00000扩展,hadoop,mapreduce,hadoop2,Hadoop,Mapreduce,Hadoop2,我能够正确地重命名我的reducer输出文件，但是r-00000仍然存在。我在减速器类中使用了多次输出。这里是细节。不确定我错过了什么，或者我还需要做什么 public class MyReducer extends Reducer<NullWritable, Text, NullWritable, Text> { private Logger logger = Logger.getLogger(MyReducer.class); private Multipl

我能够正确地重命名我的reducer输出文件，但是r-00000仍然存在。我在减速器类中使用了多次输出。这里是细节。不确定我错过了什么，或者我还需要做什么

public class MyReducer extends Reducer<NullWritable, Text, NullWritable, Text> {

    private Logger logger = Logger.getLogger(MyReducer.class);
    private MultipleOutputs<NullWritable, Text> multipleOutputs;
    String strName = "";
    public void setup(Context context) {
        logger.info("Inside Reducer.");
        multipleOutputs = new MultipleOutputs<NullWritable, Text>(context);
    }
    @Override
    public void reduce(NullWritable Key, Iterable<Text> values, Context context)
            throws IOException, InterruptedException {

        for (Text value : values) {
            final String valueStr = value.toString();
            StringBuilder sb = new StringBuilder();
            sb.append(strArrvalueStr[0] + "|!|");
            multipleOutputs.write(NullWritable.get(), new Text(sb.toString()),strName);
        }
    }

    public void cleanup(Context context) throws IOException,
            InterruptedException {
        multipleOutputs.close();
    }
}

公共类MyReducer扩展了Reducer{
私有记录器=Logger.getLogger（MyReducer.class）；
专用多路输出多路输出；
字符串strName=“”；
公共无效设置（上下文）{
logger.info（“内部减速器”）；
multipleoutput=新的multipleoutput（上下文）；
}
@凌驾
公共void reduce（NullWritable键、Iterable值、上下文）
抛出IOException、InterruptedException{
用于（文本值：值）{
最终字符串值str=value.toString（）；
StringBuilder sb=新的StringBuilder（）；
sb.追加（strArrvalueStr[0]+“|！|””；
multipleOutputs.write（nullwriteable.get（）、新文本（sb.toString（）、strName）；
}
}
公共空白清理（上下文上下文）引发IOException，
中断异常{
multipleoutput.close（）；
}
}

我可以在我的工作完成后明确地做这件事，这对我来说没关系。工作没有延误

if (b){
            DateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd-HHmm");
            Calendar cal = Calendar.getInstance();
            String strDate=dateFormat.format(cal.getTime());
            FileSystem hdfs = FileSystem.get(getConf());
            FileStatus fs[] = hdfs.listStatus(new Path(args[1]));
            if (fs != null){ 
                for (FileStatus aFile : fs) {
                    if (!aFile.isDir()) {
                        hdfs.rename(aFile.getPath(), new Path(aFile.getPath().toString()+".txt"));
                    }
                }
            }
        }

更合适的解决方法是更改OutputFormat

例如：-如果您使用的是TextOutputFormatClass，只需获取TextOutputFormat类的源代码，并修改以下方法以获得正确的文件名（不带r-00000）。然后我们需要在驱动程序中设置修改后的输出格式

public synchronized static String getUniqueFile(TaskAttemptContext context, String name, String extension) {
    /*TaskID taskId = context.getTaskAttemptID().getTaskID();
    int partition = taskId.getId();*/
    StringBuilder result = new StringBuilder();
    result.append(name);        
    /*
     * result.append('-');
     * result.append(TaskID.getRepresentingCharacter(taskId.getTaskType()));
     * result.append('-'); result.append(NUMBER_FORMAT.format(partition));
     * result.append(extension);
     */
    return result.toString();
}

因此，无论通过多个输出传递什么名称，文件名都将根据它创建。

我认为这个问题是重复的，请参见以下链接：我已重写generateFileName（）方法，但无法删除r-0000扩展名。如何在spark输出中执行相同操作？