Java 如何在Hadoop Reduce中获取当前文件名_Java_Hadoop

Java 如何在Hadoop Reduce中获取当前文件名

java hadoop

Java 如何在Hadoop Reduce中获取当前文件名,java,hadoop,Java,Hadoop,我正在使用这个示例，在Reduce函数中，我需要获取文件名 public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable>

我正在使用这个示例，在Reduce函数中，我需要获取文件名

public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {
  public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
    int sum = 0;
    while (values.hasNext()) {
      sum += values.next().get();
    }
    String filename = ((FileSplit)(.getContext()).getInputSplit()).getPath().getName();
    // ----------------------------^ I need to get the context and filename!
    key.set(key.toString() + " (" + filename + ")");
    output.collect(key, new IntWritable(sum));
  }
}

公共静态类Reduce扩展MapReduceBase实现Reducer{
公共void reduce（文本键、迭代器值、OutputCollector输出、Reporter报告器）引发IOException{
整数和=0；
while（values.hasNext（））{
sum+=values.next（）.get（）；
}
字符串文件名=（（FileSplit）（.getContext（））.getInputSplit（））.getPath（）.getName（）；
//-----------------------------------^我需要获取上下文和文件名！
key.set（key.toString（）+“（“+filename+”）；
collect（key，newintwriteable（sum））；
}
}

这是目前上面修改过的代码，我想在这里得到这个单词的文件名。我尝试了以下操作，但无法获取

上下文

对象

我是hadoop新手，需要这个帮助。有任何帮助吗？

您无法获取

上下文

，因为

上下文

是“新API”的构造，而您正在使用“旧API”

请查看以下单词计数示例：

在这种情况下，请参见reduce函数的签名：

public void reduce(Text key, Iterable<IntWritable> values, Context context)

public void reduce（文本键、Iterable值、上下文）

瞧！背景！注意，在本例中，它从

.mapreduce.

导入，而不是从

.mapred.

导入

对于hadoop的新用户来说，这是一个常见的问题，所以不要难过。一般来说，出于多种原因，您希望坚持使用新的API。但是，要非常小心你找到的例子。另外，要认识到新API和旧API是不可互操作的（例如，您不能有新的API映射器和旧的API缩减器）。

使用旧的MR API（org.apache.hadoop.mapred包），将以下内容添加到映射器/缩减器类中

String fileName = new String();
public void configure(JobConf job)
{
    filename = job.get("map.input.file");
}

String fileName = new String();
protected void setup(Context context) throws java.io.IOException, java.lang.InterruptedException
{
    fileName = ((FileSplit) context.getInputSplit()).getPath().toString();
}

使用新的MR API（org.apache.hadoop.mapreduce包），将以下内容添加到mapper/reducer类中

String fileName = new String();
public void configure(JobConf job)
{
    filename = job.get("map.input.file");
}

String fileName = new String();
protected void setup(Context context) throws java.io.IOException, java.lang.InterruptedException
{
    fileName = ((FileSplit) context.getInputSplit()).getPath().toString();
}

我用这种方式，它的工作

public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {
  private final static IntWritable one = new IntWritable(1);
  private Text word = new Text();

  public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
    String line = value.toString();
    StringTokenizer tokenizer = new StringTokenizer(line);
    while (tokenizer.hasMoreTokens()) {
      FileSplit fileSplit = (FileSplit)reporter.getInputSplit();
      String filename = fileSplit.getPath().getName();
      word.set(tokenizer.nextToken());
      output.collect(word, one);
    }
  }
}

公共静态类映射扩展MapReduceBase实现映射器{
私有最终静态IntWritable one=新的IntWritable（1）；
私有文本字=新文本（）；
公共void映射（LongWritable键、文本值、OutputCollector输出、Reporter报告器）引发IOException{
字符串行=value.toString（）；
StringTokenizer标记器=新的StringTokenizer（行）；
while（tokenizer.hasMoreTokens（））{
FileSplit FileSplit=（FileSplit）reporter.getInputSplit（）；
字符串文件名=fileSplit.getPath（）.getName（）；
set（tokenizer.nextToken（））；
输出。收集（字，一）；
}
}
}

让我知道我是否可以改进它

只是好奇-为什么喜欢新api而不是旧api-我想这两种api都会得到支持-也许我不是最新的。如何在旧api的reduce函数中获取文件名？