Java hadoop mapreduce作业中未调用Reducer
我有两个mapper类,它们只是创建键值对。我的主要逻辑应该在reducer部分。我正在尝试比较来自两个不同文本文件的数据。Java hadoop mapreduce作业中未调用Reducer,java,hadoop,mapreduce,Java,Hadoop,Mapreduce,我有两个mapper类,它们只是创建键值对。我的主要逻辑应该在reducer部分。我正在尝试比较来自两个不同文本文件的数据。 我的mapper类是 public static class Map extends Mapper<LongWritable, Text, Text, Text> { private String ky,vl="a"; public void map(LongWritable key, Text value, Context
我的mapper类是
public static class Map extends
Mapper<LongWritable, Text, Text, Text> {
private String ky,vl="a";
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
String tokens[] = line.split("\t");
vl = tokens[1].trim();
ky = tokens[2].trim();
//sending key-value pairs to the reducer
context.write(new Text(ky),new Text(vl));
}
}
public static class Reduce extends
Reducer<Text, Text, Text, Text> {
private String rslt = "l";
public void reduce(Text key, Iterator<Text> values,Context context) throws IOException, InterruptedException {
int count = 0;
while(values.hasNext()){
count++;
}
rslt = Integer.toString(count);
if(count>1){
context.write(key,new Text(rslt));
}
}
}
输出
File System Counters
FILE: Number of bytes read=361621
FILE: Number of bytes written=1501806
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=552085
HDFS: Number of bytes written=150962
HDFS: Number of read operations=28
HDFS: Number of large read operations=0
HDFS: Number of write operations=5
Map-Reduce Framework
Map input records=10783
Map output records=10783
Map output bytes=150962
Map output materialized bytes=172540
Input split bytes=507
Combine input records=0
Combine output records=0
Reduce input groups=7985
Reduce shuffle bytes=172540
Reduce input records=10783
Reduce output records=10783
Spilled Records=21566
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=12
Total committed heap usage (bytes)=928514048
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=150962
为什么需要这两个映射器类?看起来两者的作用是一样的。你能更详细地描述一下出了什么问题吗?减速器没有启动吗?作业的退出状态是什么?我使用两个文件,因为我接受用户输入并存储在另一个文件中。输出与映射后的结果相同…结果得到排序(我猜是它实现了默认的缩减器)。我的意思是Map和Map2做的相同,所以Map可以重用。但是你能描述一下减速器的情况吗?你能在工作跟踪器上看到它吗?所以你的减速机做了一些工作,但是输出不是你期望的,对吗?你能发一份你的意见样本吗?也许删除减速机中的if子句(count>1)也可以找到这个输出。@0309gunner正在执行它。它清楚地显示了Reduce input records=10783 Reduce output records=10783为什么需要这两个映射器类?看起来两者的作用是一样的。你能更详细地描述一下出了什么问题吗?减速器没有启动吗?作业的退出状态是什么?我使用两个文件,因为我接受用户输入并存储在另一个文件中。输出与映射后的结果相同…结果得到排序(我猜是它实现了默认的缩减器)。我的意思是Map和Map2做的相同,所以Map可以重用。但是你能描述一下减速器的情况吗?你能在工作跟踪器上看到它吗?所以你的减速机做了一些工作,但是输出不是你期望的,对吗?你能发一份你的意见样本吗?也许删除减速机中的if子句(count>1)也可以找到这个输出。@0309gunner正在执行它。它清楚地显示了Reduce input records=10783 Reduce output records=10783
Configuration conf = new Configuration();
Job job = new Job(conf);
job.setJarByClass(CompareTwoFiles.class);
job.setJobName("Compare Two Files and Identify the Difference");
FileOutputFormat.setOutputPath(job, new Path(args[2]));
job.setReducerClass(Reduce.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
MultipleInputs.addInputPath(job, new Path(args[0]),
TextInputFormat.class, Map.class);
MultipleInputs.addInputPath(job, new Path(args[1]),
TextInputFormat.class, Map2.class);
job.waitForCompletion(true);
File System Counters
FILE: Number of bytes read=361621
FILE: Number of bytes written=1501806
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=552085
HDFS: Number of bytes written=150962
HDFS: Number of read operations=28
HDFS: Number of large read operations=0
HDFS: Number of write operations=5
Map-Reduce Framework
Map input records=10783
Map output records=10783
Map output bytes=150962
Map output materialized bytes=172540
Input split bytes=507
Combine input records=0
Combine output records=0
Reduce input groups=7985
Reduce shuffle bytes=172540
Reduce input records=10783
Reduce output records=10783
Spilled Records=21566
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=12
Total committed heap usage (bytes)=928514048
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=150962