Java 在mapreduce中的mapper/reducer类外部访问静态HashMap_Java_Hadoop_Mapreduce

Java 在mapreduce中的mapper/reducer类外部访问静态HashMap

java hadoop mapreduce

Java 在mapreduce中的mapper/reducer类外部访问静态HashMap,java,hadoop,mapreduce,Java,Hadoop,Mapreduce,我试图编写一个mapreduce程序，其中map函数向HashMap添加项目，然后reducer访问这些项目并将其写入输出 public class MyClass { static HashMap<String, Integer> temp = new HashMap<String, Integer>(); public static class Map1 extends Mapper<LongWritable, Text, Text, IntW

我试图编写一个mapreduce程序，其中map函数向HashMap添加项目，然后reducer访问这些项目并将其写入输出

public class MyClass {
    static HashMap<String, Integer> temp = new HashMap<String, Integer>();

    public static class Map1 extends Mapper<LongWritable, Text, Text, IntWritable> {
        public void map(LongWritable key, Text value, Context context) {
            temp.put("1", 1);
        }
    }

    public static class Reduce1 extends Reducer<Text, IntWritable, Text, Text> {

        @Override
        public void reduce(Text key, Iterable<IntWritable> values, Context context) {
            Iterator it = temp.entrySet().iterator();
            while (it.hasNext()) {
                Map.Entry<String, Integer> pair = (Map.Entry<String, Integer>)it.next();
                String key = pair.getKey();
                String val = Integer.toString(pair.getValue());
                context.write(new Text(key), new Text(val));
            }
        }
    }

公共类MyClass{
静态HashMap temp=newhashmap（）；
公共静态类Map1扩展了Mapper{
公共void映射（可长写键、文本值、上下文）{
温度输入（“1”，1）；
}
}
公共静态类Reduce1扩展了Reducer{
@凌驾
公共void reduce（文本键、Iterable值、上下文）{
迭代器it=temp.entrySet（）.Iterator（）；
while（it.hasNext（））{
Map.Entry对=（Map.Entry）it.next（）；
String key=pair.getKey（）；
字符串val=Integer.toString（pair.getValue（））；
编写（新文本（键），新文本（val））；
}
}
}

这可以很好地编译，但是reducer的输出是空的。我不擅长Java，所以我不太确定这里出了什么问题。

您可能误解了MapReduce是一种分布式算法，在许多进程和潜在的机器之间运行

在该过程中创建的每个新的

MyClass

实例都将是空的，并且不会作为应用程序生命周期的一部分共享

另外，您的映射程序什么也不做。将数据发送到reducer的方法是使用

context.write

有关wordcount代码，请参阅官方Hadoop文档。

我建议您先阅读wordcount示例（），了解MapReduce的功能。Reducer和Mapper独立运行，没有共享状态（至少在这个意义上）.作为记录，我认识的人中没有一个人真的写过mapreduce。大多数做类似事情的人都使用Spark或Flinkt。这是我想的，谢谢你的回答！相信我，如果我有选择的话，我不会使用mapreduce，即使状态是共享的，默认情况下，你会为每个信号调用

temp.put

输入文件的e行如果您只想计数，可以将计数器对象附加到上下文中