Hadoop HBase MR-键/值不匹配

Hadoop HBase MR-键/值不匹配,hadoop,mapreduce,hbase,Hadoop,Mapreduce,Hbase,我试图在独立的HBase(0.94.11)上执行一个MR代码 我已经阅读了HBase api并修改了MR代码,以便从HBase表中读取数据并将结果写入HBase表,我在reduce阶段遇到异常。提供部分(不包括业务逻辑)代码 情感计算器HBase-工具/主类: package com.hbase.mapreduce; import java.util.Calendar; import org.apache.hadoop.conf.Configuration; import org.apach

我试图在独立的HBase(0.94.11)上执行一个MR代码

我已经阅读了HBase api并修改了MR代码,以便从HBase表中读取数据并将结果写入HBase表,我在reduce阶段遇到异常。提供部分(不包括业务逻辑)代码

情感计算器HBase-工具/主类:

package com.hbase.mapreduce;

import java.util.Calendar;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableInputFormat;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.hbase.mapreduce.TableOutputFormat;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class SentimentCalculatorHBase extends Configured implements Tool {

    /**
     * @param args
     * @throws Exception
     */
    public static void main(String[] args) throws Exception {
        // TODO Auto-generated method stub
        SentimentCalculatorHBase sentimentCalculatorHBase = new SentimentCalculatorHBase();
        ToolRunner.run(sentimentCalculatorHBase, args);
    }

    @Override
    public int run(String[] arg0) throws Exception {
        // TODO Auto-generated method stub


        System.out
                .println("***********************Configuration started***********************");
        Configuration configuration = getConf();
        System.out.println("Conf: " + configuration);


        Job sentiCalcJob = new Job(configuration, "HBase SentimentCalculation");

        sentiCalcJob.setJarByClass(SentimentCalculatorHBase.class);
        sentiCalcJob.setMapperClass(SentimentCalculationHBaseMapper.class);
        sentiCalcJob.setCombinerClass(SentimentCalculationHBaseReducer.class);
        sentiCalcJob.setReducerClass(SentimentCalculationHBaseReducer.class);


        sentiCalcJob.setInputFormatClass(TableInputFormat.class);
        sentiCalcJob.setOutputFormatClass(TableOutputFormat.class);

        /* Start : Added out of exasperation! */
        sentiCalcJob.setOutputKeyClass(ImmutableBytesWritable.class);
        sentiCalcJob.setOutputValueClass(Put.class);
        /* End : Added out of exasperation! */

        Scan twitterdataUserScan = new Scan();
        twitterdataUserScan.setCaching(500);

        twitterdataUserScan.addColumn("word_attributes".getBytes(),
                "TwitterText".getBytes());

        TableMapReduceUtil.initTableMapperJob("twitterdata_user",
                twitterdataUserScan, SentimentCalculationHBaseMapper.class,
                Text.class, Text.class, sentiCalcJob);

        TableMapReduceUtil.initTableReducerJob("sentiment_output",
                SentimentCalculationHBaseReducer.class, sentiCalcJob);

        Calendar beforeJob = Calendar.getInstance();
        System.out.println("Job Time started---------------- "
                + beforeJob.getTime());
        boolean check = sentiCalcJob.waitForCompletion(true);
        if (check == true) {
            System.out
                    .println("*******************Job completed- SentimentCalculation********************");
        }
        Calendar afterJob = Calendar.getInstance();
        System.out
                .println("Job Time ended SentimentCalculation---------------- "
                        + afterJob.getTime());
        return 0;
    }
}
映射器类:

public class SentimentCalculationHBaseMapper extends TableMapper<Text, Text> {

private Text sentenseOriginal = new Text();
private Text sentenseParsed = new Text();

@Override
    protected void map(
            ImmutableBytesWritable key,
            Result value,
            org.apache.hadoop.mapreduce.Mapper<ImmutableBytesWritable, Result, Text, Text>.Context context)
            throws IOException, InterruptedException {
context.write(this.sentenseOriginal, this.sentenseParsed);
}
}
为了解决这个问题,我更改了减速器的签名:

public class SentimentCalculationHBaseReducer extends
        TableReducer<Text, Text, Text>{

@Override
    protected void reduce(
            Text key,
            java.lang.Iterable<Text> values,
            org.apache.hadoop.mapreduce.Reducer<Text, Text, Text, org.apache.hadoop.io.Writable>.Context context)
            throws IOException, InterruptedException {

context.write(new Text(d3.getBytes()), put);
}

无法找出与HBase MR api不符的地方

d3
是关键
你可以这样写:

    context.write(null,put);

为了记录在案,对于任何好奇的人来说,我遇到了完全相同的问题,我的解决方案是删除定义组合器函数的以下行:

sentiCalcJob.setCombinerClass(SentimentCalculationHBaseReducer.class);
该错误似乎是由于映射程序试图运行reduce类(如xxxx_m任务中调用Reducer.run(表示映射任务)时所述)而导致的,从而导致不匹配故障


删除combinerclass定义为我解决了这个问题。

这是我最初做的事情-获得了NPE。实际上,在reduce函数中,键是不必要的。你把减缩器换成上下文怎么样。不要使用特定类型。我无法将:protectedvoid reduce(ImmutableBytesWritable键,java.lang.Iterable值,org.apache.hadoop.mapreduce.Reducer.Context)更改为:protectedvoid reduce(ImmutableBytesWritable键,java.lang.Iterable值,org.apache.hadoop.mapreduce.Reducer.Context)(编译错误)就像这样,reduce(文本键、Iterable值、上下文)仍在寻找答案?事实上,我被移出了PoC,因此没有机会测试它:'(我再次测试后将返回:|
13/09/05 15:55:20 INFO mapred.JobClient: Task Id : attempt_201309051437_0004_m_000000_0, Status : FAILED
java.io.IOException: wrong value class: class org.apache.hadoop.hbase.client.Put is not class org.apache.hadoop.io.Text
    context.write(null,put);
sentiCalcJob.setCombinerClass(SentimentCalculationHBaseReducer.class);