Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/redis/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java Hadoop MapReduce:context.write更改值_Java_Hadoop_Mapreduce - Fatal编程技术网

Java Hadoop MapReduce:context.write更改值

Java Hadoop MapReduce:context.write更改值,java,hadoop,mapreduce,Java,Hadoop,Mapreduce,我不熟悉Hadoop和编写MapReduce作业,但我遇到了一个问题,即reducers上下文中出现了这个问题。write方法正在将正确的值更改为不正确的值 MapReduce作业应该做什么? 计算总字数(int-wordCount) 计算不同字数(int counter\u dist) 计算以“z”或“z”开头的字数(int counter_startZ) 计算出现次数少于4次的字数(int counter_less4) 所有这些都必须在一个MapReduce作业中完成 正在分析的文本文件

我不熟悉Hadoop和编写MapReduce作业,但我遇到了一个问题,即reducers上下文中出现了这个问题。write方法正在将正确的值更改为不正确的值

MapReduce作业应该做什么?

  • 计算总字数
    (int-wordCount)
  • 计算不同字数
    (int counter\u dist)
  • 计算以“z”或“z”开头的字数
    (int counter_startZ)
  • 计算出现次数少于4次的字数
    (int counter_less4)
所有这些都必须在一个MapReduce作业中完成

正在分析的文本文件

Hello how zou zou zou zou how are you
正确输出:
wordCount=9

计数器\u dist=5

计数器\u startZ=4

计数器4=4

映射器类

public class WordCountMapper extends Mapper <Object, Text, Text, IntWritable> {

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    @Override
    public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
        StringTokenizer itr = new StringTokenizer(value.toString());
        while (itr.hasMoreTokens()) {
            String hasKey = itr.nextToken();
            word.set(hasKey);
            context.write(word, one);
        }

    }
}
从日志中可以看出,所有的值都是正确的,一切正常。但是,当我在HDFS中打开输出目录并读取“part-r-00000”文件时,在那里写入的context.write的输出是完全不同的

Total words: 22
Distinct words: 4
Starts with Z: 0
Appears less than 4 times: 4

对于关键的程序逻辑,决不能依赖
cleanup()
方法。每次删除JVM时都会调用
cleanup()
方法。因此,根据产生和杀死的JVM数量(您无法预测),您的逻辑变得不稳定

初始化
和写入上下文都移动到reduce方法中

i、 e


编辑:根据OP评论,整个逻辑似乎有缺陷。

下面是实现所需结果的代码请注意,我没有实现
setup()
cleanup()
;因为这根本不需要。

使用计数器计算您要查找的内容。MapReduce完成后,只需获取驱动程序类中的计数器

e、 g.可以在映射器中统计以“z”或“z”开头的单词数

public class WordCountMapper extends Mapper <Object, Text, Text, IntWritable> {

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    @Override
    public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
        StringTokenizer itr = new StringTokenizer(value.toString());
        while (itr.hasMoreTokens()) {
            String hasKey = itr.nextToken();
            word.set(hasKey);
            context.getCounter("my_counters", "TOTAL_WORDS").increment(1);
            if(hasKey.toUpperCase().startsWith("Z")){
            context.getCounter("my_counters", "Z_WORDS").increment(1);
            }
            context.write(word, one);
        }
    }
}
在Driver类中获取计数器。下面的代码位于您提交作业的行之后

CounterGroup group = job.getCounters().getGroup("my_counters");

for (Counter counter : group) {
   System.out.println(counter.getName() + "=" + counter.getValue());
}

不幸的是,这不是我想要的。我只需要在输出中输入4行正确计数的值(“问题中的正确输出”)。此解决方案每次运行reduce方法时都会在输出中添加4行。在
cleanup
方法中编写代码逻辑是一个根本性错误。您必须了解
setup
cleanup
方法的作用,特别是当hadoop为每个减速机生成新的JVM时。如果上述修复不起作用,则意味着您的逻辑需要更改。我已经添加了完整的代码逻辑。希望你能理解。让我知道。这似乎是一件奇怪的事情,如果你试着调试你的代码。看看变量!
int wordCount = 0; // Total number of words
int counter_dist = 0; // Number of distinct words in the corpus
int counter_startZ = 0; // Number of words that start with letter Z
int counter_less4 = 0; // Number of words that appear less than 4 
   context.write(new Text("Total words: "), new IntWritable(wordCount));
    context.write(new Text("Distinct words: "), new IntWritable(counter_dist));
    context.write(new Text("Starts with Z: "), new IntWritable(counter_startZ));
    context.write(new Text("Appears less than 4 times:"), new IntWritable(counter_less4));
public class WordCountMapper extends Mapper <Object, Text, Text, IntWritable> {

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    @Override
    public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
        StringTokenizer itr = new StringTokenizer(value.toString());
        while (itr.hasMoreTokens()) {
            String hasKey = itr.nextToken();
            word.set(hasKey);
            context.getCounter("my_counters", "TOTAL_WORDS").increment(1);
            if(hasKey.toUpperCase().startsWith("Z")){
            context.getCounter("my_counters", "Z_WORDS").increment(1);
            }
            context.write(word, one);
        }
    }
}
public class WordCountReducer extends Reducer <Text, IntWritable, Text, IntWritable> {

    @Override
    public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
        int wordCount= 0;
        context.getCounter("my_counters", "DISTINCT_WORDS").increment(1);
        for (IntWritable val : values){
            wordCount += val.get();
        }
        if(wordCount < 4{
           context.getCounter("my_counters", "WORDS_LESS_THAN_4").increment(1);
        }
    }
}
CounterGroup group = job.getCounters().getGroup("my_counters");

for (Counter counter : group) {
   System.out.println(counter.getName() + "=" + counter.getValue());
}