cloudera中hadoop单词计数示例中的数字获取_Hadoop_Cloudera_Word Count

cloudera中hadoop单词计数示例中的数字获取

hadoop

cloudera中hadoop单词计数示例中的数字获取,hadoop,cloudera,word-count,Hadoop,Cloudera,Word Count,以下是我们使用的代码： map类是WCMapper。 reduce类是WCReducer 不太清楚为什么输出生成的是number而不是wordcount public class WCMapper extends Mapper { public void map(LongWritable key,Text value,Context context) throws IOException,InterruptedException { String line =

以下是我们使用的代码： map类是WCMapper。 reduce类是WCReducer

不太清楚为什么输出生成的是number而不是wordcount

public class WCMapper extends Mapper { 
    public void map(LongWritable key,Text value,Context context) throws 
    IOException,InterruptedException 
       { String line = key.toString(); 
        StringTokenizer tokenizer = new StringTokenizer(line); 
          while(tokenizer.hasMoreTokens()) 
          { value.set(tokenizer.nextToken()); 
           context.write(value, new IntWritable(1)); 
            }
            }

       }

 public class WCReducer extends Reducer<Text,IntWritable,Text,IntWritable>{
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values, Context context)throws IOException,InterruptedException
{
    int sum=0;
    for(IntWritable x: values)
    {
        sum+=x.get();

    }

    result.set(sum);
    System.out.println("Key: "+key+"Value: "+sum);
    context.write(key, result);

}
   }    



public static void main(String[] args) throws Exception{
    Configuration conf = new Configuration();

    Job job = Job.getInstance(conf, "WordCount");

    job.setJarByClass(WorCount.class);
    job.setMapperClass(WCMapper.class);
    job.setReducerClass(WCReducer.class);

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);

    job.setInputFormatClass(TextInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);

     Path outputPath = new Path(args[1]);

     FileInputFormat.addInputPath(job, new Path(args[0]));
     FileOutputFormat.setOutputPath(job, new Path(args[1]));

     outputPath.getFileSystem(conf).delete(outputPath, true);

     System.exit(job.waitForCompletion(true)? 0: 1);
}

公共类WCMapper扩展映射器{
公共void映射（LongWritable键、文本值、上下文）抛出
IOException，InterruptedException
{String line=key.toString（）；
StringTokenizer标记器=新的StringTokenizer（行）；
while（tokenizer.hasMoreTokens（））
{value.set（tokenizer.nextToken（））；
write（值，新的intwriteable（1））；
}
}
}
公共类WCReducer扩展了Reducer{
私有IntWritable结果=新的IntWritable（）；
公共void reduce（文本键、Iterable值、上下文上下文）引发IOException、InterruptedException
{
整数和=0；
for（可写x:值）
{
sum+=x.get（）；
}
结果集（总和）；
System.out.println（“键：+Key+”值：+sum）；
编写（键、结果）；
}
}    
公共静态void main（字符串[]args）引发异常{
Configuration conf=新配置（）；
Job Job=Job.getInstance（conf，“WordCount”）；
job.setJarByClass（WorCount.class）；
setMapperClass（WCMapper.class）；
job.setReducerClass（WCReducer.class）；
job.setOutputKeyClass（Text.class）；
job.setOutputValueClass（IntWritable.class）；
setInputFormatClass（TextInputFormat.class）；
setOutputFormatClass（TextOutputFormat.class）；
路径outputPath=新路径（args[1]）；
addInputPath（作业，新路径（args[0]）；
setOutputPath（作业，新路径（args[1]）；
getFileSystem（conf）.delete（outputPath，true）；
系统退出（作业等待完成（真）？0:1；
}

输入文件：这是cloudera 这很聪明

预期产出：这个2 是2 cloudera 1 智能1

获得的产出： 0 1

17 1

问题在于您的地图绘制程序：

String line = key.toString();

本例中的

键

是一个

LongWritable

，表示文件中行的字节偏移量。如果您将该行更改为

value

，然后不使用下面的

value

，您将得到正确的答案

新地图绘制者：

public void map(LongWritable key, Text value, Context context) throws IOException,InterruptedException { 
    String line = value.toString(); 
    StringTokenizer tokenizer = new StringTokenizer(line); 
    Text word = new Text();

    while(tokenizer.hasMoreTokens()) {
        word.set(tokenizer.nextToken()); 
        context.write(word, new IntWritable(1)); 
    }
}

问题出在您的映射器中：

String line = key.toString();

本例中的

键

是一个

LongWritable

，表示文件中行的字节偏移量。如果您将该行更改为

value

，然后不使用下面的

value

，您将得到正确的答案

新地图绘制者：

public void map(LongWritable key, Text value, Context context) throws IOException,InterruptedException { 
    String line = value.toString(); 
    StringTokenizer tokenizer = new StringTokenizer(line); 
    Text word = new Text();

    while(tokenizer.hasMoreTokens()) {
        word.set(tokenizer.nextToken()); 
        context.write(word, new IntWritable(1)); 
    }
}

也许这个问题可以在某种程度上帮助你，也许这个问题可以在某种程度上帮助你，