Java MapReduce设计模式中的Mapper类和Reducer类_Java_Hadoop_Mapreduce

Java MapReduce设计模式中的Mapper类和Reducer类

java hadoop mapreduce

Java MapReduce设计模式中的Mapper类和Reducer类,java,hadoop,mapreduce,Java,Hadoop,Mapreduce,我是MapReduce的新手，我对Mapper类和Reducer类设计的这段代码有些怀疑我熟悉MapReduce中的Map Side加入，并了解到： public static class CustsMapper extends Mapper<Object, Text, Text, Text> { public void map(Object key, Text value, Context context) throws IOException, Interrup

我是MapReduce的新手，我对Mapper类和Reducer类设计的这段代码有些怀疑

我熟悉MapReduce中的Map Side加入，并了解到：

public static class CustsMapper extends Mapper<Object, Text, Text, Text> {
        public void map(Object key, Text value, Context context) throws IOException, InterruptedException {

代码：

1979   23   23   2   43   24   25   26   26   26   26   25   26  25 
1980   26   27   28  28   28   30   31   31   31   30   30   30  29 
1981   31   32   32  32   33   34   35   36   36   34   34   34  34 
1984   39   38   39  39   39   41   42   43   40   39   38   38  40 
1985   38   39   39  39   39   41   41   41   00   40   39   39  45

package hadoop; 

import java.util.*; 

import java.io.IOException; 
import java.io.IOException; 

import org.apache.hadoop.fs.Path; 
import org.apache.hadoop.conf.*; 
import org.apache.hadoop.io.*; 
import org.apache.hadoop.mapred.*; 
import org.apache.hadoop.util.*; 

public class ProcessUnits 
{ 
   //Mapper class 
   public static class E_EMapper extends MapReduceBase implements Mapper<LongWritable ,Text,Text,IntWritable>       
   {       
      //Map function 
      public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException 
      { 
         String line = value.toString(); 
         String lasttoken = null; 
         StringTokenizer s = new StringTokenizer(line,"\t"); 
         String year = s.nextToken(); 

         while(s.hasMoreTokens())
            {
               lasttoken=s.nextToken();
            } 

         int avgprice = Integer.parseInt(lasttoken); 
         output.collect(new Text(year), new IntWritable(avgprice)); 
      } 
   } 


   //Reducer class 
   public static class E_EReduce extends MapReduceBase implements Reducer< Text, IntWritable, Text, IntWritable> 
   {     
      //Reduce function 
      public void reduce( Text key, Iterator <IntWritable> values, 
         OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException 
         { 
            int maxavg=30; 
            int val=Integer.MIN_VALUE; 

            while (values.hasNext()) 
            { 
               if((val=values.next().get())>maxavg) 
               { 
                  output.collect(key, new IntWritable(val)); 
               } 
            } 

         } 
   }  


   //Main function 
   public static void main(String args[])throws Exception 
   { 
      JobConf conf = new JobConf(ProcessUnits.class); 

      conf.setJobName("max_eletricityunits"); 
      conf.setOutputKeyClass(Text.class);
      conf.setOutputValueClass(IntWritable.class); 
      conf.setMapperClass(E_EMapper.class); 
      conf.setCombinerClass(E_EReduce.class); 
      conf.setReducerClass(E_EReduce.class); 
      conf.setInputFormat(TextInputFormat.class); 
      conf.setOutputFormat(TextOutputFormat.class); 

      FileInputFormat.setInputPaths(conf, new Path(args[0])); 
      FileOutputFormat.setOutputPath(conf, new Path(args[1])); 

      JobClient.runJob(conf); 
   } 
}

1981    34 
1984    40 
1985    45

为什么我们将类扩展到MapReduceBase（它是做什么的？），为什么我们将类实现到Mapper

因为这是Hadoop2.x出现之前用MapredAPI编写的旧代码

我知道上下文应该在这里，但这里的OutputCollector和Reporter是什么

它是上下文对象的早期版本

为什么我们将类扩展到MapReduceBase（它是做什么的？），为什么我们将类实现到Mapper

因为这是Hadoop2.x出现之前用MapredAPI编写的旧代码

我知道上下文应该在这里，但这里的OutputCollector和Reporter是什么

它是上下文对象的早期版本

mapred.*是旧的API类OutputCollector是Context.writeDear@cricket_007的旧方法吗？好的，但是OutputCollector和Reporter做了什么？它的工作原理是否类似于org.apache.hadoop.mapreduce.*
？它只是旧的API吗？对吗？mapreduce包使用Mapper和Reducer作为完整类，而不是interfacesDear@cricket_007是的，您的意思是旧API（mapreduce.*）上的接口和新API（mapreduce.*）上的类是旧的API类OutputCollector是Context.writeDear@cricket_007的旧方法吗？好的，但是OutputCollector和Reporter做了什么？它的工作原理是否类似于org.apache.hadoop.mapreduce.*
？它只是旧的API吗？对吗？mapreduce包使用Mapper和Reducer作为完整类，而不是interfacesDear@cricket_007是的，你的意思是旧API（mapreduce.*）的接口和新API（mapreduce.*）的类都已经出现了。谢谢你，我亲爱的朋友。谢谢你，我亲爱的朋友。