Hadoop 改进Wordcount中的身份映射器_Hadoop_Mapreduce_Yarn

Hadoop 改进Wordcount中的身份映射器

hadoop mapreduce

Hadoop 改进Wordcount中的身份映射器,hadoop,mapreduce,yarn,Hadoop,Mapreduce,Yarn,我创建了一个map方法，用于读取wordcount示例[1]的map输出。本例不使用MapReduce提供的IdentityMapper.class，但这是我发现的为字数创建有效IdentityMapper的唯一方法。唯一的问题是，这个映射程序花费的时间比我想要的要多得多。我开始想也许我在做一些多余的事情。有什么帮助来改进我的WordCountIdentityMapper代码吗 [1] 身份映射器 public class WordCountIdentityMapper extends MyMa

我创建了一个map方法，用于读取wordcount示例[1]的map输出。本例不使用MapReduce提供的

IdentityMapper.class

，但这是我发现的为字数创建有效

IdentityMapper

的唯一方法。唯一的问题是，这个映射程序花费的时间比我想要的要多得多。我开始想也许我在做一些多余的事情。有什么帮助来改进我的

WordCountIdentityMapper

代码吗

[1] 身份映射器

public class WordCountIdentityMapper extends MyMapper<LongWritable, Text, Text, IntWritable> {
    private Text word = new Text();

    public void map(LongWritable key, Text value, Context context
    ) throws IOException, InterruptedException {
        StringTokenizer itr = new StringTokenizer(value.toString());
        word.set(itr.nextToken());
        Integer val = Integer.valueOf(itr.nextToken());
        context.write(word, new IntWritable(val));
    }

    public void run(Context context) throws IOException, InterruptedException {
        while (context.nextKeyValue()) {
            map(context.getCurrentKey(), context.getCurrentValue(), context);
        }
    }
}

公共类WordCountIdentityMapper扩展了MyMapper{
私有文本字=新文本（）；
公共void映射（可长写键、文本值、上下文
)抛出IOException、InterruptedException{
StringTokenizer itr=新的StringTokenizer（value.toString（））；
set（itr.nextToken（））；
Integer val=Integer.valueOf（itr.nextToken（））；
write（word，newintwriteable（val））；
}
公共void运行（上下文上下文）引发IOException、InterruptedException{
while（context.nextKeyValue（））{
映射（context.getCurrentKey（），context.getCurrentValue（），context）；
}
}
}

[2] 生成mapoutput的Map类

public static class MyMap extends Mapper<LongWritable, Text, Text, IntWritable> {
    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(LongWritable key, Text value, Context context
    ) throws IOException, InterruptedException {
        StringTokenizer itr = new StringTokenizer(value.toString());

        while (itr.hasMoreTokens()) {
            word.set(itr.nextToken());
            context.write(word, one);
        }
    }

    public void run(Context context) throws IOException, InterruptedException {
        try {
            while (context.nextKeyValue()) {
                map(context.getCurrentKey(), context.getCurrentValue(), context);
            }
        } finally {
            cleanup(context);
        }
    }
}

公共静态类MyMap扩展映射器{
私有最终静态IntWritable one=新的IntWritable（1）；
私有文本字=新文本（）；
公共void映射（可长写键、文本值、上下文
)抛出IOException、InterruptedException{
StringTokenizer itr=新的StringTokenizer（value.toString（））；
而（itr.hasMoreTokens（））{
set（itr.nextToken（））；
上下文。写（单词，一）；
}
}
公共void运行（上下文上下文）引发IOException、InterruptedException{
试一试{
while（context.nextKeyValue（））{
映射（context.getCurrentKey（），context.getCurrentValue（），context）；
}
}最后{
清理（上下文）；
}
}
}

谢谢，

解决方案是用

indexOf（）

方法替换

StringTokenizer

。它的效果要好得多。我获得了更好的性能。

解决方案是用

indexOf（）

方法替换

StringTokenizer

。它的效果要好得多。我有更好的表现