Java MapReduce查找字长频率
我是MapReduce的新手,我想问问是否有人能给我一个使用MapReduce执行单词长度频率的想法。我已经有了字数计算的代码,但我想使用字长,这是我到目前为止得到的Java MapReduce查找字长频率,java,hadoop,mapreduce,Java,Hadoop,Mapreduce,我是MapReduce的新手,我想问问是否有人能给我一个使用MapReduce执行单词长度频率的想法。我已经有了字数计算的代码,但我想使用字长,这是我到目前为止得到的 public class WordCount { public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWri
public class WordCount {
public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
}
}
公共类字数{
公共静态类映射扩展映射器{
私有最终静态IntWritable one=新的IntWritable(1);
私有文本字=新文本();
公共void映射(LongWritable键、文本值、上下文上下文)引发IOException、InterruptedException{
字符串行=value.toString();
StringTokenizer标记器=新的StringTokenizer(行);
while(tokenizer.hasMoreTokens()){
set(tokenizer.nextToken());
上下文。写(单词,一);
}
}
}
感谢…对于字长频率,
标记器.nextToken()
不应作为键发出。实际上会考虑该字符串的长度。因此,只需进行以下更改,您的代码就可以了,并且足够了:
word.set( String.valueOf( tokenizer.nextToken().length() ));
现在,如果您仔细观察,您会发现,Mapper
输出键不应该再是Text
,尽管它可以工作。最好使用IntWritable
键:
public static class Map extends Mapper<LongWritable, Text, IntWritable, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private IntWritable wordLength = new IntWritable();
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
wordLength.set(tokenizer.nextToken().length());
context.write(wordLength, one);
}
}
}
公共静态类映射扩展映射器{
私有最终静态IntWritable one=新的IntWritable(1);
private intwriteable wordLength=new intwriteable();
公共void映射(可长写键、文本值、上下文)
抛出IOException、InterruptedException{
字符串行=value.toString();
StringTokenizer标记器=新的StringTokenizer(行);
while(tokenizer.hasMoreTokens()){
set(tokenizer.nextToken().length());
上下文。写(字长,一);
}
}
}
尽管大多数MapReduce
示例都使用StringTokenizer
,但使用String.split
方法更简洁、更可取。因此,进行相应的更改。您可以用一个示例进行解释。欢迎使用SE,分享示例和您为获得更好的服务所做的努力。您发布的代码似乎是一个简单的Ctrl+c代码>&Ctrl+v
!请尽量避免这种情况。