Hadoop(java)更改映射器输出值的类型
我正在编写一个mapper函数,它将键生成为一些用户id,并且值也是文本类型。我是这样做的Hadoop(java)更改映射器输出值的类型,java,apache,hadoop,types,mapreduce,Java,Apache,Hadoop,Types,Mapreduce,我正在编写一个mapper函数,它将键生成为一些用户id,并且值也是文本类型。我是这样做的 public static class UserMapper extends Mapper<Object, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text userid = new Text(); private Text
public static class UserMapper extends Mapper<Object, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text userid = new Text();
private Text catid = new Text();
/* map method */
public void map(Object key, Text value, Context context)
throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString(), ","); /* separated by "," */
int count = 0;
userid.set(itr.nextToken());
while (itr.hasMoreTokens()) {
if (++count == 3) {
catid.set(itr.nextToken());
context.write(userid, catid);
}else {
itr.nextToken();
}
}
}
}
因此,尽管我已将输出值的类设置为Text.class,但编译时仍会出现以下错误:
popularCategories.java:39: write(org.apache.hadoop.io.Text,org.apache.hadoop.io.IntWritable)
in org.apache.hadoop.mapreduce.TaskInputOutputContext<java.lang.Object,
org.apache.hadoop.io.Text,org.apache.hadoop.io.Text,
org.apache.hadoop.io.IntWritable>
cannot be applied to (org.apache.hadoop.io.Text,org.apache.hadoop.io.Text)
context.write(userid, catid);
^
因此,我想了解类定义和设置mapper output vaue类之间的区别。类定义同时具有输入和输出类型。例如,您的映射器正在接收
对象,文本
,并发出文本,文本
。在驱动程序类中,您已将映射器类的预期输出设置为键和值的Text
,因此hadoop框架希望映射器类定义具有这些输出类型,并且在调用context.write(Text,Text)时,您的类将发出键和值的Text
来自Apache文档的
在从中更正定义中的映射器值后,问题已得到解决
public static class UserMapper extends Mapper<Object, Text, Text, IntWritable> {
公共静态类UserMapper扩展了映射器{
到
公共静态类UserMapper扩展了映射器{
查看相关SE问题:
我发现这对于清楚理解概念也很有用。在映射器类定义中,您将outputValue类设置为IntWriteable
public static class UserMapper extends Mapper<Object, Text, Text, IntWritable>
即使已将MapOutputValueClass设置为文本,也需要更改映射器类的定义,使其与驱动程序中设置的键和值输出类保持同步
Class Mapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT>
java.lang.Object
org.apache.hadoop.mapreduce.Mapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT>
KEYIN = offset of the record ( input for Mapper )
VALUEIN = value of the line in the record ( input for Mapper )
KEYOUT = Mapper output key ( Output of Mapper, input of Reducer)
VALUEOUT = Mapper output value ( Output of Mapper, input to Reducer)
public static class UserMapper extends Mapper<Object, Text, Text, IntWritable> {
public static class UserMapper extends Mapper<Object, Text, Text, Text> {
public static class UserMapper extends Mapper<Object, Text, Text, IntWritable>
private Text catid = new Text();