Java Hadoop MapReduce:context.write更改值_Java_Hadoop_Mapreduce

Java Hadoop MapReduce:context.write更改值

java hadoop mapreduce

Java Hadoop MapReduce:context.write更改值,java,hadoop,mapreduce,Java,Hadoop,Mapreduce,我不熟悉Hadoop和编写MapReduce作业，但我遇到了一个问题，即reducers上下文中出现了这个问题。write方法正在将正确的值更改为不正确的值 MapReduce作业应该做什么？计算总字数（int-wordCount）计算不同字数（int counter\u dist）计算以“z”或“z”开头的字数（int counter_startZ）计算出现次数少于4次的字数（int counter_less4）所有这些都必须在一个MapReduce作业中完成正在分析的文本文件

我不熟悉Hadoop和编写MapReduce作业，但我遇到了一个问题，即reducers上下文中出现了这个问题。write方法正在将正确的值更改为不正确的值

MapReduce作业应该做什么？

计算总字数
```
（int-wordCount）
```
计算不同字数
```
（int counter\u dist）
```
计算以“z”或“z”开头的字数
```
（int counter_startZ）
```
计算出现次数少于4次的字数
```
（int counter_less4）
```

所有这些都必须在一个MapReduce作业中完成

正在分析的文本文件

Hello how zou zou zou zou how are you

正确输出：

wordCount=9

计数器\u dist=5

计数器\u startZ=4

计数器4=4

映射器类

public class WordCountMapper extends Mapper <Object, Text, Text, IntWritable> {

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    @Override
    public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
        StringTokenizer itr = new StringTokenizer(value.toString());
        while (itr.hasMoreTokens()) {
            String hasKey = itr.nextToken();
            word.set(hasKey);
            context.write(word, one);
        }

    }
}

从日志中可以看出，所有的值都是正确的，一切正常。但是，当我在HDFS中打开输出目录并读取“part-r-00000”文件时，在那里写入的context.write的输出是完全不同的

Total words: 22
Distinct words: 4
Starts with Z: 0
Appears less than 4 times: 4

对于关键的程序逻辑，决不能依赖

cleanup（）

方法。每次删除JVM时都会调用

cleanup（）

方法。因此，根据产生和杀死的JVM数量（您无法预测），您的逻辑变得不稳定

将

初始化

和写入上下文都移动到reduce方法中

i、 e

及

编辑：根据OP评论，整个逻辑似乎有缺陷。

下面是实现所需结果的代码请注意，我没有实现
setup（）
或
cleanup（）
；因为这根本不需要。

使用计数器计算您要查找的内容。MapReduce完成后，只需获取驱动程序类中的计数器

e、 g.可以在映射器中统计以“z”或“z”开头的单词数

public class WordCountMapper extends Mapper <Object, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); @Override public void map(Object key, Text value, Context context) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { String hasKey = itr.nextToken(); word.set(hasKey); context.getCounter("my_counters", "TOTAL_WORDS").increment(1); if(hasKey.toUpperCase().startsWith("Z")){ context.getCounter("my_counters", "Z_WORDS").increment(1); } context.write(word, one); } } }
在Driver类中获取计数器。下面的代码位于您提交作业的行之后

CounterGroup group = job.getCounters().getGroup("my_counters"); for (Counter counter : group) { System.out.println(counter.getName() + "=" + counter.getValue()); }

不幸的是，这不是我想要的。我只需要在输出中输入4行正确计数的值（“问题中的正确输出”）。此解决方案每次运行reduce方法时都会在输出中添加4行。在
cleanup
方法中编写代码逻辑是一个根本性错误。您必须了解
setup
和
cleanup
方法的作用，特别是当hadoop为每个减速机生成新的JVM时。如果上述修复不起作用，则意味着您的逻辑需要更改。我已经添加了完整的代码逻辑。希望你能理解。让我知道。这似乎是一件奇怪的事情，如果你试着调试你的代码。看看变量！
int wordCount = 0; // Total number of words int counter_dist = 0; // Number of distinct words in the corpus int counter_startZ = 0; // Number of words that start with letter Z int counter_less4 = 0; // Number of words that appear less than 4

context.write(new Text("Total words: "), new IntWritable(wordCount)); context.write(new Text("Distinct words: "), new IntWritable(counter_dist)); context.write(new Text("Starts with Z: "), new IntWritable(counter_startZ)); context.write(new Text("Appears less than 4 times:"), new IntWritable(counter_less4));

public class WordCountMapper extends Mapper <Object, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); @Override public void map(Object key, Text value, Context context) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { String hasKey = itr.nextToken(); word.set(hasKey); context.getCounter("my_counters", "TOTAL_WORDS").increment(1); if(hasKey.toUpperCase().startsWith("Z")){ context.getCounter("my_counters", "Z_WORDS").increment(1); } context.write(word, one); } } }

public class WordCountReducer extends Reducer <Text, IntWritable, Text, IntWritable> { @Override public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int wordCount= 0; context.getCounter("my_counters", "DISTINCT_WORDS").increment(1); for (IntWritable val : values){ wordCount += val.get(); } if(wordCount < 4{ context.getCounter("my_counters", "WORDS_LESS_THAN_4").increment(1); } } }

CounterGroup group = job.getCounters().getGroup("my_counters"); for (Counter counter : group) { System.out.println(counter.getName() + "=" + counter.getValue()); }