Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/376.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java Hbase MapReduce:如何使用自定义类作为映射器和/或reducer的值?_Java_Hadoop_Mapreduce_Hbase - Fatal编程技术网

Java Hbase MapReduce:如何使用自定义类作为映射器和/或reducer的值?

Java Hbase MapReduce:如何使用自定义类作为映射器和/或reducer的值?,java,hadoop,mapreduce,hbase,Java,Hadoop,Mapreduce,Hbase,我正在尝试熟悉Hadoop/Hbase MapReduce作业,以便能够正确编写它们。现在我有一个Hbase实例,它有一个名为dns的表,其中包含一些dns记录。我试着制作一个简单的唯一域计数器,输出一个文件,它成功了。现在,我只使用intwriteable或Text,我想知道是否可以为我的Mapper/Reducer使用自定义对象。我试着自己做,但我越来越难了 Error: java.io.IOException: Initialization of all the collectors fa

我正在尝试熟悉Hadoop/Hbase MapReduce作业,以便能够正确编写它们。现在我有一个Hbase实例,它有一个名为dns的表,其中包含一些dns记录。我试着制作一个简单的唯一域计数器,输出一个文件,它成功了。现在,我只使用
intwriteable
Text
,我想知道是否可以为我的Mapper/Reducer使用自定义对象。我试着自己做,但我越来越难了

Error: java.io.IOException: Initialization of all the collectors failed. Error in last collector was :null
    at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:415)
    at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:81)
    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:698)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
Caused by: java.lang.NullPointerException
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:1011)
    at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:402)
    ... 9 more
正如我所说,MapperOutputValue只是一个简单的类,它包含一个私有整数、一个带参数的构造函数、一个getter和一个setter。我还尝试添加了一个
toString
方法,但仍然不起作用

所以我的问题是:使用自定义类作为还原器的映射器/输入输出的最佳方式是什么?另外,假设我想使用一个具有多个字段的类作为reducer的最终输出。这个类应该实现/扩展什么?这是一个好主意还是我应该坚持使用“原语”作为可写或文本


谢谢!

MapOutputValue
应实现
可写
,以便它可以在MapReduce作业中的任务之间序列化。将
MapOutputJob
替换为以下内容应该可以工作:

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;

public class DomainCountWritable implements Writable {
    private Text domain;
    private IntWritable count;

    public DomainCountWritable() {
        this.domain = new Text();
        this.count = new IntWritable(0);
    }

    public DomainCountWritable(Text domain, IntWritable count) {
        this.domain = domain;
        this.count = count;
    }

    public Text getDomain() {
        return this.domain;
    }

    public IntWritable getCount() {
        return this.count;
    }

    public void setDomain(Text domain) {
        this.domain = domain;
    }

    public void setCount(IntWritable count) {
        this.count = count;
    }

    public void readFields(DataInput in) throws IOException {
        this.domain.readFields(in);
        this.count.readFields(in);
    }

    public void write(DataOutput out) throws IOException {
        this.domain.write(out);
        this.count.write(out);
    }

    @Override
    public String toString() {
        return this.domain.toString() + "\t" + this.count.toString();
    }
}

MapperOutputValue
是否实现了
可写性
。如果您对MapReduce不太熟悉,可以将其分解为一个更简单的问题,即从HDFS文件中读取DNS记录,然后添加HBase连接,这样就可以了。@BenWatson在我发布此消息后,我实现了
WritableComparable
接口,并且我能够使其工作,但只能使用整数。我不知道什么方法是处理字符串的最佳方法。无论如何,谢谢你的邀请article@BenWatson你可以用那篇文章的例子来回答(或者另一篇,你怎么想),我会接受的。我让它按照我想要的方式工作,这篇文章非常有用。谢谢很高兴我能帮忙。
public class Reduce extends Reducer<Text, MapperOutputValue, Text, IntWritable> {
    @Override
    public void reduce(Text key, Iterable<MapperOutputValue> values, Context context)
            throws IOException, InterruptedException {

        int i = 0;
        for (MapperOutputValue val : values) {
            i += val.getCount();
        }

        context.write(key, new IntWritable(i));
    }
}
 TableMapReduceUtil.initTableMapperJob(
                "dns",
                scan,
                Map.class,
                Text.class,
                MapperOutputValue.class,
                job);

/* Set output parameters */
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setOutputFormatClass(TextOutputFormat.class);
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;

public class DomainCountWritable implements Writable {
    private Text domain;
    private IntWritable count;

    public DomainCountWritable() {
        this.domain = new Text();
        this.count = new IntWritable(0);
    }

    public DomainCountWritable(Text domain, IntWritable count) {
        this.domain = domain;
        this.count = count;
    }

    public Text getDomain() {
        return this.domain;
    }

    public IntWritable getCount() {
        return this.count;
    }

    public void setDomain(Text domain) {
        this.domain = domain;
    }

    public void setCount(IntWritable count) {
        this.count = count;
    }

    public void readFields(DataInput in) throws IOException {
        this.domain.readFields(in);
        this.count.readFields(in);
    }

    public void write(DataOutput out) throws IOException {
        this.domain.write(out);
        this.count.write(out);
    }

    @Override
    public String toString() {
        return this.domain.toString() + "\t" + this.count.toString();
    }
}