Java 映射到HBase的Mapreduce作业引发IOException:传递删除或Put_Java_Hadoop_Mapreduce_Hbase_Elastic Map Reduce

Java 映射到HBase的Mapreduce作业引发IOException:传递删除或Put
java hadoop mapreduce hbase
Java 映射到HBase的Mapreduce作业引发IOException:传递删除或Put,java,hadoop,mapreduce,hbase,elastic-map-reduce,Java,Hadoop,Mapreduce,Hbase,Elastic Map Reduce,在EMR上使用Hadoop2.4.0和HBase0.94.18时，我试图直接从映射器输出到HBase表我遇到了一个讨厌的IOException：在执行下面的代码时传递一个Delete或Put public class TestHBase { static class ImportMapper extends Mapper<MyKey, MyValue, ImmutableBytesWritable, Writable> { private by
在EMR上使用Hadoop2.4.0和HBase0.94.18时，我试图直接从映射器输出到HBase表
我遇到了一个讨厌的
IOException：在执行下面的代码时传递一个Delete或Put
public class TestHBase {
  static class ImportMapper 
            extends Mapper<MyKey, MyValue, ImmutableBytesWritable, Writable> {
    private byte[] family = Bytes.toBytes("f");

    @Override
    public void map(MyKey key, MyValue value, Context context) {
      MyItem item = //do some stuff with key/value and create item
      byte[] rowKey = Bytes.toBytes(item.getKey());
      Put put = new Put(rowKey);
      for (String attr : Arrays.asList("a1", "a2", "a3")) {
        byte[] qualifier = Bytes.toBytes(attr);
        put.add(family, qualifier, Bytes.toBytes(item.get(attr)));
      }
      context.write(new ImmutableBytesWritable(rowKey), put);
    }
  }

  public static void main(String[] args) throws Exception {
    Configuration conf = HBaseConfiguration.create();
    String input = args[0];
    String table = "table";
    Job job = Job.getInstance(conf, "stuff");

    job.setJarByClass(ImportMapper.class);
    job.setInputFormatClass(SequenceFileInputFormat.class);
    FileInputFormat.setInputDirRecursive(job, true);
    FileInputFormat.addInputPath(job, new Path(input));

    TableMapReduceUtil.initTableReducerJob(
            table,                  // output table
            null,                   // reducer class
            job);
    job.setNumReduceTasks(0);
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}

公共类TestHBase{
静态类导入器
扩展映射器{
专用字节[]系列=字节数。字节数（“f”）；
@凌驾
公共void映射（MyKey、MyValue、上下文）{
MyItem item=//使用key/value执行一些操作并创建item
byte[]rowKey=Bytes.toBytes（item.getKey（））；
Put Put=新Put（行键）；
对于（字符串attr:Arrays.asList（“a1”、“a2”、“a3”））{
byte[]限定符=Bytes.toBytes（attr）；
add（family、限定符、Bytes.toBytes（item.get（attr））；
}
write（新的ImmutableBytesWritable（rowKey），put）；
}
}
公共静态void main（字符串[]args）引发异常{
Configuration=HBaseConfiguration.create（）；
字符串输入=args[0]；
String table=“table”；
Job=Job.getInstance（conf，“stuff”）；
job.setJarByClass（ImportMapper.class）；
作业.setInputFormatClass（SequenceFileInputFormat.class）；
setInputDirRecursive（作业，true）；
addInputPath（作业，新路径（输入））；
TableMapReduceUtil.initTableReducerJob(
表，//输出表
null，//reducer类
工作）；
job.setNumReduceTasks（0）；
系统退出（作业等待完成（真）？0:1；
}
}

有人知道我做错了什么吗
Stacktrace
错误：java.io.IOException:在org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.write（TableOutputFormat.java:125）上传递org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.write（TableOutputFormat.java:84）在org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write上的删除或放置（MapTask.java:646）在org.apache.hadoop.mapreduce.task.taskInputPutContextImpl.write（taskInputPutPutContextImpl.java:89）在org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write（WrappedMapper.java:112）在org.apache.hadoop.mapreduce.Mapper.map（Mapper.java:124）在org.apache.apache.hadoop.mapreduce.mapreduce.map.run（Mapper.java:145）上org.apache.hadoop.mapred.MapTask.runNewMapper（MapTask.java:775）org.apache.hadoop.mapred.MapTask.run（MapTask.java:341）org.apache.hadoop.mapred.YarnChild$2.run（YarnChild.java:167）java.security.AccessController.doPrivileged（本机方法）javax.security.auth.Subject.doAs（Subject.java:415）org.apache.hadoop.security.UserGroupInformation.doAs（UserGroupInformation.java:1548）org.apache.hadoop.mapred.YarnChild.main（YarnChild.java:162）容器被ApplicationMaster终止。容器在请求时终止。退出代码是143容器退出，退出代码为非零143
如果您可以显示完整的堆栈跟踪，这样我可以帮助您轻松解决问题。我没有执行您的代码。就我所看到的代码而言，这可能是问题所在

job.setNumReduceTasks（0）；

Mapper将期望您的put对象直接写入Apache HBase。
您可以增加setNumReduceTasks，或者如果您看到API，您可以找到它的默认值并对其进行注释。
感谢您添加堆栈跟踪。不幸的是，您没有包含引发异常的代码，因此我无法为您完全跟踪它。相反，我进行了一些搜索，为您发现了一些东西
您的堆栈跟踪与另一个堆栈跟踪相似，因此问题如下：

那个人通过注释掉job.setNumReduceTasks（0）；

有一个类似的SO问题存在相同的异常，但无法通过这种方式解决问题。相反，它在注释方面存在问题：


下面是一些很好的例子，说明了如何在setNumReduceTasks为0和1或更多的情况下编写工作代码
“51.2.HBase MapReduce读/写示例
下面是一个将HBase用作MapReduce的源和接收器的示例。此示例将简单地将数据从一个表复制到另一个表
Configuration config = HBaseConfiguration.create();
Job job = new Job(config,"ExampleReadWrite");
job.setJarByClass(MyReadWriteJob.class);    // class that contains mapper

Scan scan = new Scan();
scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
scan.setCacheBlocks(false);  // don't set to true for MR jobs
// set other scan attrs

TableMapReduceUtil.initTableMapperJob(
  sourceTable,      // input table
  scan,             // Scan instance to control CF and attribute selection
  MyMapper.class,   // mapper class
  null,             // mapper output key
  null,             // mapper output value
  job);
TableMapReduceUtil.initTableReducerJob(
  targetTable,      // output table
  null,             // reducer class
  job);
job.setNumReduceTasks(0);

boolean b = job.waitForCompletion(true);
if (!b) {
    throw new IOException("error with job!");
}

这是一个或多个示例：
“51.4.HBase MapReduce摘要到HBase示例
以下示例使用HBase作为MapReduce源和汇，并执行摘要步骤。此示例将统计表中某个值的不同实例数，并将这些摘要计数写入另一个表中
Configuration config = HBaseConfiguration.create();
Job job = new Job(config,"ExampleSummary");
job.setJarByClass(MySummaryJob.class);     // class that contains mapper and reducer

Scan scan = new Scan();
scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
scan.setCacheBlocks(false);  // don't set to true for MR jobs
// set other scan attrs

TableMapReduceUtil.initTableMapperJob(
  sourceTable,        // input table
  scan,               // Scan instance to control CF and attribute selection
  MyMapper.class,     // mapper class
  Text.class,         // mapper output key
  IntWritable.class,  // mapper output value
  job);
TableMapReduceUtil.initTableReducerJob(
  targetTable,        // output table
  MyTableReducer.class,    // reducer class
  job);
job.setNumReduceTasks(1);   // at least one, adjust as required

boolean b = job.waitForCompletion(true);
if (!b) {
  throw new IOException("error with job!");
}


您似乎更接近第一个示例。我想说明，有时将reduce任务数设置为零是有原因的。
让我们看看您提供的代码中的堆栈跟踪：context.write（new ImmutableBytesWritable（rowKey），put）
它在映射方法之外。请先修复它，因为它与回溯显示的内容不匹配…感谢您指出Ruben，这是复制/粘贴错误添加了堆栈跟踪