Hadoop(java)更改映射器输出值的类型

Hadoop(java)更改映射器输出值的类型,java,apache,hadoop,types,mapreduce,Java,Apache,Hadoop,Types,Mapreduce,我正在编写一个mapper函数,它将键生成为一些用户id,并且值也是文本类型。我是这样做的 public static class UserMapper extends Mapper<Object, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text userid = new Text(); private Text

我正在编写一个mapper函数,它将键生成为一些用户id,并且值也是文本类型。我是这样做的

public static class UserMapper extends Mapper<Object, Text, Text, IntWritable> {
    private final static IntWritable one = new IntWritable(1);
    private Text userid = new Text();
    private Text catid = new Text();

    /* map method */
    public void map(Object key, Text value, Context context)
                throws IOException, InterruptedException {
        StringTokenizer itr = new StringTokenizer(value.toString(), ","); /* separated by "," */
        int count = 0;

        userid.set(itr.nextToken());

        while (itr.hasMoreTokens()) {
            if (++count == 3) {
                catid.set(itr.nextToken());
                context.write(userid, catid);
            }else {
                itr.nextToken();
            }
        }
    }
}
因此,尽管我已将输出值的类设置为Text.class,但编译时仍会出现以下错误:

popularCategories.java:39: write(org.apache.hadoop.io.Text,org.apache.hadoop.io.IntWritable)
 in org.apache.hadoop.mapreduce.TaskInputOutputContext<java.lang.Object,
 org.apache.hadoop.io.Text,org.apache.hadoop.io.Text,
 org.apache.hadoop.io.IntWritable> 
 cannot be applied to (org.apache.hadoop.io.Text,org.apache.hadoop.io.Text)
 context.write(userid, catid);
                           ^

因此,我想了解类定义和设置mapper output vaue类之间的区别。

类定义同时具有输入和输出类型。例如,您的映射器正在接收
对象,文本
,并发出
文本,文本
。在驱动程序类中,您已将映射器类的预期输出设置为键和值的
Text
,因此hadoop框架希望映射器类定义具有这些输出类型,并且在调用
context.write(Text,Text)时,您的类将发出键和值的
Text

来自Apache文档的

在从中更正定义中的映射器值后,问题已得到解决

public static class UserMapper extends Mapper<Object, Text, Text, IntWritable> {
公共静态类UserMapper扩展了映射器{

公共静态类UserMapper扩展了映射器{
查看相关SE问题:


我发现这对于清楚理解概念也很有用。

在映射器类定义中,您将outputValue类设置为IntWriteable

public static class UserMapper extends Mapper<Object, Text, Text, IntWritable>
即使已将MapOutputValueClass设置为文本,也需要更改映射器类的定义,使其与驱动程序中设置的键和值输出类保持同步

Class Mapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT>

java.lang.Object
org.apache.hadoop.mapreduce.Mapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT>
KEYIN = offset of the record  ( input for Mapper )
VALUEIN = value of the line in the record ( input for Mapper )
KEYOUT = Mapper output key ( Output of Mapper, input of Reducer)
VALUEOUT = Mapper output value ( Output of Mapper, input to Reducer)
public static class UserMapper extends Mapper<Object, Text, Text, IntWritable> {
public static class UserMapper extends Mapper<Object, Text, Text, Text> {
public static class UserMapper extends Mapper<Object, Text, Text, IntWritable>
private Text catid = new Text();