Hadoop（java）更改映射器输出值的类型_Java_Apache_Hadoop_Types_Mapreduce

Hadoop（java）更改映射器输出值的类型

java apache hadoop types mapreduce

Hadoop（java）更改映射器输出值的类型,java,apache,hadoop,types,mapreduce,Java,Apache,Hadoop,Types,Mapreduce,我正在编写一个mapper函数，它将键生成为一些用户id，并且值也是文本类型。我是这样做的 public static class UserMapper extends Mapper<Object, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text userid = new Text(); private Text

我正在编写一个mapper函数，它将键生成为一些用户id，并且值也是文本类型。我是这样做的

public static class UserMapper extends Mapper<Object, Text, Text, IntWritable> {
    private final static IntWritable one = new IntWritable(1);
    private Text userid = new Text();
    private Text catid = new Text();

    /* map method */
    public void map(Object key, Text value, Context context)
                throws IOException, InterruptedException {
        StringTokenizer itr = new StringTokenizer(value.toString(), ","); /* separated by "," */
        int count = 0;

        userid.set(itr.nextToken());

        while (itr.hasMoreTokens()) {
            if (++count == 3) {
                catid.set(itr.nextToken());
                context.write(userid, catid);
            }else {
                itr.nextToken();
            }
        }
    }
}

因此，尽管我已将输出值的类设置为Text.class，但编译时仍会出现以下错误：

popularCategories.java:39: write(org.apache.hadoop.io.Text,org.apache.hadoop.io.IntWritable)
 in org.apache.hadoop.mapreduce.TaskInputOutputContext<java.lang.Object,
 org.apache.hadoop.io.Text,org.apache.hadoop.io.Text,
 org.apache.hadoop.io.IntWritable> 
 cannot be applied to (org.apache.hadoop.io.Text,org.apache.hadoop.io.Text)
 context.write(userid, catid);
                           ^

因此，我想了解类定义和设置mapper output vaue类之间的区别。

类定义同时具有输入和输出类型。例如，您的映射器正在接收

对象，文本

，并发出

文本，文本

。在驱动程序类中，您已将映射器类的预期输出设置为键和值的

Text

，因此hadoop框架希望映射器类定义具有这些输出类型，并且在调用

context.write（Text，Text）时，您的类将发出键和值的Text

来自Apache文档的

在从中更正定义中的映射器值后，问题已得到解决

public static class UserMapper extends Mapper<Object, Text, Text, IntWritable> {

公共静态类UserMapper扩展了映射器{

到

公共静态类UserMapper扩展了映射器{

查看相关SE问题：

我发现这对于清楚理解概念也很有用。

在映射器类定义中，您将outputValue类设置为IntWriteable

public static class UserMapper extends Mapper<Object, Text, Text, IntWritable>

即使已将MapOutputValueClass设置为文本，也需要更改映射器类的定义，使其与驱动程序中设置的键和值输出类保持同步

Class Mapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT>

java.lang.Object
org.apache.hadoop.mapreduce.Mapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT>

KEYIN = offset of the record  ( input for Mapper )
VALUEIN = value of the line in the record ( input for Mapper )
KEYOUT = Mapper output key ( Output of Mapper, input of Reducer)
VALUEOUT = Mapper output value ( Output of Mapper, input to Reducer)

public static class UserMapper extends Mapper<Object, Text, Text, IntWritable> {

public static class UserMapper extends Mapper<Object, Text, Text, Text> {

public static class UserMapper extends Mapper<Object, Text, Text, IntWritable>

private Text catid = new Text();