cloudera中hadoop单词计数示例中的数字获取
以下是我们使用的代码: map类是WCMapper。 reduce类是WCReducer 不太清楚为什么输出生成的是number而不是wordcountcloudera中hadoop单词计数示例中的数字获取,hadoop,cloudera,word-count,Hadoop,Cloudera,Word Count,以下是我们使用的代码: map类是WCMapper。 reduce类是WCReducer 不太清楚为什么输出生成的是number而不是wordcount public class WCMapper extends Mapper { public void map(LongWritable key,Text value,Context context) throws IOException,InterruptedException { String line =
public class WCMapper extends Mapper {
public void map(LongWritable key,Text value,Context context) throws
IOException,InterruptedException
{ String line = key.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while(tokenizer.hasMoreTokens())
{ value.set(tokenizer.nextToken());
context.write(value, new IntWritable(1));
}
}
}
public class WCReducer extends Reducer<Text,IntWritable,Text,IntWritable>{
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values, Context context)throws IOException,InterruptedException
{
int sum=0;
for(IntWritable x: values)
{
sum+=x.get();
}
result.set(sum);
System.out.println("Key: "+key+"Value: "+sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception{
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "WordCount");
job.setJarByClass(WorCount.class);
job.setMapperClass(WCMapper.class);
job.setReducerClass(WCReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
Path outputPath = new Path(args[1]);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
outputPath.getFileSystem(conf).delete(outputPath, true);
System.exit(job.waitForCompletion(true)? 0: 1);
}
公共类WCMapper扩展映射器{
公共void映射(LongWritable键、文本值、上下文)抛出
IOException,InterruptedException
{String line=key.toString();
StringTokenizer标记器=新的StringTokenizer(行);
while(tokenizer.hasMoreTokens())
{value.set(tokenizer.nextToken());
write(值,新的intwriteable(1));
}
}
}
公共类WCReducer扩展了Reducer{
私有IntWritable结果=新的IntWritable();
公共void reduce(文本键、Iterable值、上下文上下文)引发IOException、InterruptedException
{
整数和=0;
for(可写x:值)
{
sum+=x.get();
}
结果集(总和);
System.out.println(“键:+Key+”值:+sum);
编写(键、结果);
}
}
公共静态void main(字符串[]args)引发异常{
Configuration conf=新配置();
Job Job=Job.getInstance(conf,“WordCount”);
job.setJarByClass(WorCount.class);
setMapperClass(WCMapper.class);
job.setReducerClass(WCReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
setInputFormatClass(TextInputFormat.class);
setOutputFormatClass(TextOutputFormat.class);
路径outputPath=新路径(args[1]);
addInputPath(作业,新路径(args[0]);
setOutputPath(作业,新路径(args[1]);
getFileSystem(conf).delete(outputPath,true);
系统退出(作业等待完成(真)?0:1;
}
输入文件:
这是cloudera
这很聪明
预期产出:
这个2
是2
cloudera 1
智能1
获得的产出:
0 1
17 1问题在于您的地图绘制程序:
String line = key.toString();
本例中的键
是一个LongWritable
,表示文件中行的字节偏移量。如果您将该行更改为value
,然后不使用下面的value
,您将得到正确的答案
新地图绘制者:
public void map(LongWritable key, Text value, Context context) throws IOException,InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
Text word = new Text();
while(tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, new IntWritable(1));
}
}
问题出在您的映射器中:
String line = key.toString();
本例中的键
是一个LongWritable
,表示文件中行的字节偏移量。如果您将该行更改为value
,然后不使用下面的value
,您将得到正确的答案
新地图绘制者:
public void map(LongWritable key, Text value, Context context) throws IOException,InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
Text word = new Text();
while(tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, new IntWritable(1));
}
}
也许这个问题可以在某种程度上帮助你,也许这个问题可以在某种程度上帮助你,