Java MapReduce设计模式中的Mapper类和Reducer类
我是MapReduce的新手,我对Mapper类和Reducer类设计的这段代码有些怀疑 我熟悉MapReduce中的Map Side加入,并了解到:Java MapReduce设计模式中的Mapper类和Reducer类,java,hadoop,mapreduce,Java,Hadoop,Mapreduce,我是MapReduce的新手,我对Mapper类和Reducer类设计的这段代码有些怀疑 我熟悉MapReduce中的Map Side加入,并了解到: public static class CustsMapper extends Mapper<Object, Text, Text, Text> { public void map(Object key, Text value, Context context) throws IOException, Interrup
public static class CustsMapper extends Mapper<Object, Text, Text, Text> {
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
代码:
1979 23 23 2 43 24 25 26 26 26 26 25 26 25
1980 26 27 28 28 28 30 31 31 31 30 30 30 29
1981 31 32 32 32 33 34 35 36 36 34 34 34 34
1984 39 38 39 39 39 41 42 43 40 39 38 38 40
1985 38 39 39 39 39 41 41 41 00 40 39 39 45
package hadoop;
import java.util.*;
import java.io.IOException;
import java.io.IOException;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;
public class ProcessUnits
{
//Mapper class
public static class E_EMapper extends MapReduceBase implements Mapper<LongWritable ,Text,Text,IntWritable>
{
//Map function
public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException
{
String line = value.toString();
String lasttoken = null;
StringTokenizer s = new StringTokenizer(line,"\t");
String year = s.nextToken();
while(s.hasMoreTokens())
{
lasttoken=s.nextToken();
}
int avgprice = Integer.parseInt(lasttoken);
output.collect(new Text(year), new IntWritable(avgprice));
}
}
//Reducer class
public static class E_EReduce extends MapReduceBase implements Reducer< Text, IntWritable, Text, IntWritable>
{
//Reduce function
public void reduce( Text key, Iterator <IntWritable> values,
OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException
{
int maxavg=30;
int val=Integer.MIN_VALUE;
while (values.hasNext())
{
if((val=values.next().get())>maxavg)
{
output.collect(key, new IntWritable(val));
}
}
}
}
//Main function
public static void main(String args[])throws Exception
{
JobConf conf = new JobConf(ProcessUnits.class);
conf.setJobName("max_eletricityunits");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(E_EMapper.class);
conf.setCombinerClass(E_EReduce.class);
conf.setReducerClass(E_EReduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
JobClient.runJob(conf);
}
}
1981 34
1984 40
1985 45
为什么我们将类扩展到MapReduceBase(它是做什么的?),为什么我们将类实现到Mapper
因为这是Hadoop2.x出现之前用MapredAPI编写的旧代码
我知道上下文应该在这里,但这里的OutputCollector和Reporter是什么
它是上下文对象的早期版本
为什么我们将类扩展到MapReduceBase(它是做什么的?),为什么我们将类实现到Mapper 因为这是Hadoop2.x出现之前用MapredAPI编写的旧代码 我知道上下文应该在这里,但这里的OutputCollector和Reporter是什么 它是上下文对象的早期版本
mapred.*代码>是旧的API类OutputCollector是Context.writeDear@cricket_007的旧方法吗?好的,但是OutputCollector和Reporter做了什么?它的工作原理是否类似于org.apache.hadoop.mapreduce.*
?它只是旧的API吗?对吗?mapreduce包使用Mapper和Reducer作为完整类,而不是interfacesDear@cricket_007是的,您的意思是旧API(mapreduce.*)上的接口和新API(mapreduce.*)上的类代码>是旧的API类OutputCollector是Context.writeDear@cricket_007的旧方法吗?好的,但是OutputCollector和Reporter做了什么?它的工作原理是否类似于org.apache.hadoop.mapreduce.*
?它只是旧的API吗?对吗?mapreduce包使用Mapper和Reducer作为完整类,而不是interfacesDear@cricket_007是的,你的意思是旧API(mapreduce.*)的接口和新API(mapreduce.*)的类都已经出现了。谢谢你,我亲爱的朋友。谢谢你,我亲爱的朋友。