Java Mapreduce将3列标记化_Java_Hadoop_While Loop_Mapreduce_Stringtokenizer

Java Mapreduce将3列标记化

java hadoop mapreduce

Java Mapreduce将3列标记化,java,hadoop,while-loop,mapreduce,stringtokenizer,Java,Hadoop,While Loop,Mapreduce,Stringtokenizer,我正在编写一个需要读取3列的映射函数。我有一个文本文件： 1234567 12234254 40 如何更改一个简单的wordcount映射器的stringtokenizer，使其能够在使用while循环时读取3行 public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>{ private final static IntWritable one = new IntW

我正在编写一个需要读取3列的映射函数。我有一个文本文件：

1234567 12234254 40

如何更改一个简单的wordcount映射器的stringtokenizer，使其能够在使用while循环时读取3行

public static class TokenizerMapper
   extends Mapper<Object, Text, Text, IntWritable>{

private final static IntWritable one = new IntWritable(1);
private Text word = new Text();

public void map(Object key, Text value, Context context
                ) throws IOException, InterruptedException {
  StringTokenizer itr = new StringTokenizer(value.toString());
  while (itr.hasMoreTokens()) {
    word.set(itr.nextToken()); 
    context.write(word, one); 
  }
}

公共静态类TokenizerMapper
扩展映射器{
私有最终静态IntWritable one=新的IntWritable（1）；
私有文本字=新文本（）；
公共无效映射（对象键、文本值、上下文
)抛出IOException、InterruptedException{
StringTokenizer itr=新的StringTokenizer（value.toString（））；
而（itr.hasMoreTokens（））{
set（itr.nextToken（））；
上下文。写（单词，一）；
}
}

}

该代码的工作原理与您想要的完全相同，但正如Javadoc中所述

StringTokenizer是一个遗留类，尽管新代码中不鼓励使用它，但出于兼容性原因保留它

相反，使用for循环

private Text t = new Text();
... 

for (String column : value.toString().split("\\s+")) {
    t.set(column);
    context.write(t, ONE);
}