cloudera中hadoop单词计数示例中的数字获取

cloudera中hadoop单词计数示例中的数字获取,hadoop,cloudera,word-count,Hadoop,Cloudera,Word Count,以下是我们使用的代码: map类是WCMapper。 reduce类是WCReducer 不太清楚为什么输出生成的是number而不是wordcount public class WCMapper extends Mapper { public void map(LongWritable key,Text value,Context context) throws IOException,InterruptedException { String line =

以下是我们使用的代码: map类是WCMapper。 reduce类是WCReducer

不太清楚为什么输出生成的是number而不是wordcount

public class WCMapper extends Mapper { 
    public void map(LongWritable key,Text value,Context context) throws 
    IOException,InterruptedException 
       { String line = key.toString(); 
        StringTokenizer tokenizer = new StringTokenizer(line); 
          while(tokenizer.hasMoreTokens()) 
          { value.set(tokenizer.nextToken()); 
           context.write(value, new IntWritable(1)); 
            }
            }

       }

 public class WCReducer extends Reducer<Text,IntWritable,Text,IntWritable>{
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values, Context context)throws IOException,InterruptedException
{
    int sum=0;
    for(IntWritable x: values)
    {
        sum+=x.get();

    }

    result.set(sum);
    System.out.println("Key: "+key+"Value: "+sum);
    context.write(key, result);

}
   }    



public static void main(String[] args) throws Exception{
    Configuration conf = new Configuration();

    Job job = Job.getInstance(conf, "WordCount");

    job.setJarByClass(WorCount.class);
    job.setMapperClass(WCMapper.class);
    job.setReducerClass(WCReducer.class);

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);

    job.setInputFormatClass(TextInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);

     Path outputPath = new Path(args[1]);

     FileInputFormat.addInputPath(job, new Path(args[0]));
     FileOutputFormat.setOutputPath(job, new Path(args[1]));

     outputPath.getFileSystem(conf).delete(outputPath, true);

     System.exit(job.waitForCompletion(true)? 0: 1);
}
公共类WCMapper扩展映射器{
公共void映射(LongWritable键、文本值、上下文)抛出
IOException,InterruptedException
{String line=key.toString();
StringTokenizer标记器=新的StringTokenizer(行);
while(tokenizer.hasMoreTokens())
{value.set(tokenizer.nextToken());
write(值,新的intwriteable(1));
}
}
}
公共类WCReducer扩展了Reducer{
私有IntWritable结果=新的IntWritable();
公共void reduce(文本键、Iterable值、上下文上下文)引发IOException、InterruptedException
{
整数和=0;
for(可写x:值)
{
sum+=x.get();
}
结果集(总和);
System.out.println(“键:+Key+”值:+sum);
编写(键、结果);
}
}    
公共静态void main(字符串[]args)引发异常{
Configuration conf=新配置();
Job Job=Job.getInstance(conf,“WordCount”);
job.setJarByClass(WorCount.class);
setMapperClass(WCMapper.class);
job.setReducerClass(WCReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
setInputFormatClass(TextInputFormat.class);
setOutputFormatClass(TextOutputFormat.class);
路径outputPath=新路径(args[1]);
addInputPath(作业,新路径(args[0]);
setOutputPath(作业,新路径(args[1]);
getFileSystem(conf).delete(outputPath,true);
系统退出(作业等待完成(真)?0:1;
}
输入文件: 这是cloudera 这很聪明

预期产出: 这个2 是2 cloudera 1 智能1

获得的产出: 0 1
17 1

问题在于您的地图绘制程序:

String line = key.toString();
本例中的
是一个
LongWritable
,表示文件中行的字节偏移量。如果您将该行更改为
value
,然后不使用下面的
value
,您将得到正确的答案

新地图绘制者:

public void map(LongWritable key, Text value, Context context) throws IOException,InterruptedException { 
    String line = value.toString(); 
    StringTokenizer tokenizer = new StringTokenizer(line); 
    Text word = new Text();

    while(tokenizer.hasMoreTokens()) {
        word.set(tokenizer.nextToken()); 
        context.write(word, new IntWritable(1)); 
    }
}

问题出在您的映射器中:

String line = key.toString();
本例中的
是一个
LongWritable
,表示文件中行的字节偏移量。如果您将该行更改为
value
,然后不使用下面的
value
,您将得到正确的答案

新地图绘制者:

public void map(LongWritable key, Text value, Context context) throws IOException,InterruptedException { 
    String line = value.toString(); 
    StringTokenizer tokenizer = new StringTokenizer(line); 
    Text word = new Text();

    while(tokenizer.hasMoreTokens()) {
        word.set(tokenizer.nextToken()); 
        context.write(word, new IntWritable(1)); 
    }
}

也许这个问题可以在某种程度上帮助你,也许这个问题可以在某种程度上帮助你,