MapReduce相关-我在这里做错了什么?
我是映射Reduce编程范例的新手。所以,我的问题对很多人来说可能听起来非常愚蠢。然而,我请求大家对我宽容 我试图计算文件中某个特定单词出现的次数。现在,我为此编写了以下Java类 此文件的输入文件包含以下条目:MapReduce相关-我在这里做错了什么?,mapreduce,Mapreduce,我是映射Reduce编程范例的新手。所以,我的问题对很多人来说可能听起来非常愚蠢。然而,我请求大家对我宽容 我试图计算文件中某个特定单词出现的次数。现在,我为此编写了以下Java类 此文件的输入文件包含以下条目: The tiger entered village in the night the the \ Then ... the story continues... I have put the word 'the' many times because of my own program
The tiger entered village in the night the the \
Then ... the story continues...
I have put the word 'the' many times because of my own program purpose.
WordCountMapper.java
package com.demo.map_reduce.word_count.mapper;
import java.io.IOException;
import org.apache.commons.lang3.StringUtils;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable>
{
@SuppressWarnings({ "rawtypes", "unchecked" })
@Override
protected void map(LongWritable key, Text value, org.apache.hadoop.mapreduce.Mapper.Context context) throws IOException, InterruptedException {
if(null != value) {
final String line = value.toString();
if(StringUtils.containsIgnoreCase(line, "the")) {
context.write(new Text("the"), new IntWritable(StringUtils.countMatches(line, "the")));
}
}
}
}
package com.demo.map_reduce.word_count.reducer;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable>
{
@SuppressWarnings({ "rawtypes", "unchecked" })
public void reduce(Text key, Iterable<IntWritable> values, org.apache.hadoop.mapreduce.Reducer.Context context)
throws IOException, InterruptedException {
int count = 0;
for (final IntWritable nextValue : values) {
count += nextValue.get();
}
context.write(key, new IntWritable(count));
}
}
package com.demo.map_reduce.word_count;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import com.demo.map_reduce.word_count.mapper.WordCountMapper;
import com.demo.map_reduce.word_count.reducer.WordCountReducer;
public class WordCounter
{
public static void main(String[] args) {
final String inputDataPath = "/input/my_wordcount_1/input_data_file.txt";
final String outputDataDir = "/output/my_wordcount_1";
try {
final Job job = Job.getInstance();
job.setJobName("Simple word count");
job.setJarByClass(WordCounter.class);
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(inputDataPath));
FileOutputFormat.setOutputPath(job, new Path(outputDataDir));
job.waitForCompletion(true);
}
} catch (Exception e) {
e.printStackTrace();
}
}
当我在Hadoop中运行这个程序时,我得到以下输出
the 2
the 1
the 3
我想要减速机的结果
the 4
我确信我做错了什么;或者我可能没有完全理解。有人能帮我吗
提前谢谢
-Niranjan问题是您的reduce方法没有被调用
要使其工作,只需将reduce函数的签名更改为
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
public void reduce(文本键、Iterable值、上下文)
抛出IOException、InterruptedException{
问题在于您没有规范化关键字,也没有计算单词,而是计算包含单词的行数
将映射逻辑更改为以下内容
protected void map(LongWritable key, Text value, org.apache.hadoop.mapreduce.Mapper.Context context) throws IOException, InterruptedException {
if(null != value) {
final String line = value.toString();
for(String word:line.split("\\s+")){
context.write(new Text(word.trim().toLowerCase()), new IntWritable(1));
}
}
}
public void reduce(Text key, Iterable<IntWritable> values, org.apache.hadoop.mapreduce.Reducer.Context context)
throws IOException, InterruptedException {
int count = 0;
if(key.toString().trim().toLowerCase().equals("the")) {
for (final IntWritable nextValue : values) {
count += nextValue.get();
}
context.write(key, new IntWritable(count));
}
}
并将逻辑简化为以下内容
protected void map(LongWritable key, Text value, org.apache.hadoop.mapreduce.Mapper.Context context) throws IOException, InterruptedException {
if(null != value) {
final String line = value.toString();
for(String word:line.split("\\s+")){
context.write(new Text(word.trim().toLowerCase()), new IntWritable(1));
}
}
}
public void reduce(Text key, Iterable<IntWritable> values, org.apache.hadoop.mapreduce.Reducer.Context context)
throws IOException, InterruptedException {
int count = 0;
if(key.toString().trim().toLowerCase().equals("the")) {
for (final IntWritable nextValue : values) {
count += nextValue.get();
}
context.write(key, new IntWritable(count));
}
}
public void reduce(文本键,Iterable值,org.apache.hadoop.mapreduce.Reducer.Context)
抛出IOException、InterruptedException{
整数计数=0;
if(key.toString().trim().toLowerCase().equals(“the”)){
for(最终IntWritable下一个值:值){
count+=nextValue.get();
}
write(key,newintwriteable(count));
}
}
Hi,我的reduce()
方法签名与您提到的相同。但是,我也觉得没有调用reduce()
方法(因为我无法在控制台中看到System.out.println()
语句)。你看到其他问题了吗?你正在使用org.apache.hadoop.mapreduce.Reducer.Context。只要将其更改为Context或将其更改为org.apache.hadoop.mapreduce.Reducer.Context即可。我知道了。在将签名更改为org.apache.hadoop.mapreduce.Reducer.Context
后,hadoop开始调用Reducer,一切都安排妥当了。非常感谢uch.感谢您的帮助。我发现逻辑不是问题所在。问题是由于方法签名不匹配而未调用Reducer。在做出zuxoj建议的更改后,程序甚至开始使用我在原始问题中发布的自己的逻辑工作。也感谢您…)