Java Hadoop MapReduce:context.write更改值
我不熟悉Hadoop和编写MapReduce作业,但我遇到了一个问题,即reducers上下文中出现了这个问题。write方法正在将正确的值更改为不正确的值 MapReduce作业应该做什么?Java Hadoop MapReduce:context.write更改值,java,hadoop,mapreduce,Java,Hadoop,Mapreduce,我不熟悉Hadoop和编写MapReduce作业,但我遇到了一个问题,即reducers上下文中出现了这个问题。write方法正在将正确的值更改为不正确的值 MapReduce作业应该做什么? 计算总字数(int-wordCount) 计算不同字数(int counter\u dist) 计算以“z”或“z”开头的字数(int counter_startZ) 计算出现次数少于4次的字数(int counter_less4) 所有这些都必须在一个MapReduce作业中完成 正在分析的文本文件
- 计算总字数
(int-wordCount)
- 计算不同字数
(int counter\u dist)
- 计算以“z”或“z”开头的字数
(int counter_startZ)
- 计算出现次数少于4次的字数
(int counter_less4)
Hello how zou zou zou zou how are you
正确输出:wordCount=9
计数器\u dist=5
计数器\u startZ=4
计数器4=4
映射器类
public class WordCountMapper extends Mapper <Object, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
@Override
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
String hasKey = itr.nextToken();
word.set(hasKey);
context.write(word, one);
}
}
}
从日志中可以看出,所有的值都是正确的,一切正常。但是,当我在HDFS中打开输出目录并读取“part-r-00000”文件时,在那里写入的context.write的输出是完全不同的
Total words: 22
Distinct words: 4
Starts with Z: 0
Appears less than 4 times: 4
对于关键的程序逻辑,决不能依赖
cleanup()
方法。每次删除JVM时都会调用cleanup()
方法。因此,根据产生和杀死的JVM数量(您无法预测),您的逻辑变得不稳定
将初始化
和写入上下文都移动到reduce方法中
i、 e
及
编辑:根据OP评论,整个逻辑似乎有缺陷。 下面是实现所需结果的代码请注意,我没有实现
setup()
或cleanup()
;因为这根本不需要。
使用计数器计算您要查找的内容。MapReduce完成后,只需获取驱动程序类中的计数器
e、 g.可以在映射器中统计以“z”或“z”开头的单词数
public class WordCountMapper extends Mapper <Object, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
@Override
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
String hasKey = itr.nextToken();
word.set(hasKey);
context.getCounter("my_counters", "TOTAL_WORDS").increment(1);
if(hasKey.toUpperCase().startsWith("Z")){
context.getCounter("my_counters", "Z_WORDS").increment(1);
}
context.write(word, one);
}
}
}
在Driver类中获取计数器。下面的代码位于您提交作业的行之后
CounterGroup group = job.getCounters().getGroup("my_counters");
for (Counter counter : group) {
System.out.println(counter.getName() + "=" + counter.getValue());
}
不幸的是,这不是我想要的。我只需要在输出中输入4行正确计数的值(“问题中的正确输出”)。此解决方案每次运行reduce方法时都会在输出中添加4行。在
cleanup
方法中编写代码逻辑是一个根本性错误。您必须了解setup
和cleanup
方法的作用,特别是当hadoop为每个减速机生成新的JVM时。如果上述修复不起作用,则意味着您的逻辑需要更改。我已经添加了完整的代码逻辑。希望你能理解。让我知道。这似乎是一件奇怪的事情,如果你试着调试你的代码。看看变量!
int wordCount = 0; // Total number of words
int counter_dist = 0; // Number of distinct words in the corpus
int counter_startZ = 0; // Number of words that start with letter Z
int counter_less4 = 0; // Number of words that appear less than 4
context.write(new Text("Total words: "), new IntWritable(wordCount));
context.write(new Text("Distinct words: "), new IntWritable(counter_dist));
context.write(new Text("Starts with Z: "), new IntWritable(counter_startZ));
context.write(new Text("Appears less than 4 times:"), new IntWritable(counter_less4));
public class WordCountMapper extends Mapper <Object, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
@Override
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
String hasKey = itr.nextToken();
word.set(hasKey);
context.getCounter("my_counters", "TOTAL_WORDS").increment(1);
if(hasKey.toUpperCase().startsWith("Z")){
context.getCounter("my_counters", "Z_WORDS").increment(1);
}
context.write(word, one);
}
}
}
public class WordCountReducer extends Reducer <Text, IntWritable, Text, IntWritable> {
@Override
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int wordCount= 0;
context.getCounter("my_counters", "DISTINCT_WORDS").increment(1);
for (IntWritable val : values){
wordCount += val.get();
}
if(wordCount < 4{
context.getCounter("my_counters", "WORDS_LESS_THAN_4").increment(1);
}
}
}
CounterGroup group = job.getCounters().getGroup("my_counters");
for (Counter counter : group) {
System.out.println(counter.getName() + "=" + counter.getValue());
}