Java 映射到HBase的Mapreduce作业引发IOException:传递删除或Put
在EMR上使用Hadoop2.4.0和HBase0.94.18时,我试图直接从映射器输出到HBase表 我遇到了一个讨厌的Java 映射到HBase的Mapreduce作业引发IOException:传递删除或Put,java,hadoop,mapreduce,hbase,elastic-map-reduce,Java,Hadoop,Mapreduce,Hbase,Elastic Map Reduce,在EMR上使用Hadoop2.4.0和HBase0.94.18时,我试图直接从映射器输出到HBase表 我遇到了一个讨厌的IOException:在执行下面的代码时传递一个Delete或Put public class TestHBase { static class ImportMapper extends Mapper<MyKey, MyValue, ImmutableBytesWritable, Writable> { private by
IOException:在执行下面的代码时传递一个Delete或Put
public class TestHBase {
static class ImportMapper
extends Mapper<MyKey, MyValue, ImmutableBytesWritable, Writable> {
private byte[] family = Bytes.toBytes("f");
@Override
public void map(MyKey key, MyValue value, Context context) {
MyItem item = //do some stuff with key/value and create item
byte[] rowKey = Bytes.toBytes(item.getKey());
Put put = new Put(rowKey);
for (String attr : Arrays.asList("a1", "a2", "a3")) {
byte[] qualifier = Bytes.toBytes(attr);
put.add(family, qualifier, Bytes.toBytes(item.get(attr)));
}
context.write(new ImmutableBytesWritable(rowKey), put);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = HBaseConfiguration.create();
String input = args[0];
String table = "table";
Job job = Job.getInstance(conf, "stuff");
job.setJarByClass(ImportMapper.class);
job.setInputFormatClass(SequenceFileInputFormat.class);
FileInputFormat.setInputDirRecursive(job, true);
FileInputFormat.addInputPath(job, new Path(input));
TableMapReduceUtil.initTableReducerJob(
table, // output table
null, // reducer class
job);
job.setNumReduceTasks(0);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
公共类TestHBase{
静态类导入器
扩展映射器{
专用字节[]系列=字节数。字节数(“f”);
@凌驾
公共void映射(MyKey、MyValue、上下文){
MyItem item=//使用key/value执行一些操作并创建item
byte[]rowKey=Bytes.toBytes(item.getKey());
Put Put=新Put(行键);
对于(字符串attr:Arrays.asList(“a1”、“a2”、“a3”)){
byte[]限定符=Bytes.toBytes(attr);
add(family、限定符、Bytes.toBytes(item.get(attr));
}
write(新的ImmutableBytesWritable(rowKey),put);
}
}
公共静态void main(字符串[]args)引发异常{
Configuration=HBaseConfiguration.create();
字符串输入=args[0];
String table=“table”;
Job=Job.getInstance(conf,“stuff”);
job.setJarByClass(ImportMapper.class);
作业.setInputFormatClass(SequenceFileInputFormat.class);
setInputDirRecursive(作业,true);
addInputPath(作业,新路径(输入));
TableMapReduceUtil.initTableReducerJob(
表,//输出表
null,//reducer类
工作);
job.setNumReduceTasks(0);
系统退出(作业等待完成(真)?0:1;
}
}
有人知道我做错了什么吗
Stacktrace
错误:java.io.IOException:在org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:125)上传递org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:84)在org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write上的删除或放置(MapTask.java:646)在org.apache.hadoop.mapreduce.task.taskInputPutContextImpl.write(taskInputPutPutContextImpl.java:89)在org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)在org.apache.hadoop.mapreduce.Mapper.map(Mapper.java:124)在org.apache.apache.hadoop.mapreduce.mapreduce.map.run(Mapper.java:145)上org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:775)org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)java.security.AccessController.doPrivileged(本机方法)javax.security.auth.Subject.doAs(Subject.java:415)org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)容器被ApplicationMaster终止。容器在请求时终止。退出代码是143容器退出,退出代码为非零143如果您可以显示完整的堆栈跟踪,这样我可以帮助您轻松解决问题。我没有执行您的代码。就我所看到的代码而言,这可能是问题所在
job.setNumReduceTasks(0);
Mapper将期望您的put
对象直接写入Apache HBase。
您可以增加setNumReduceTasks,或者如果您看到API,您可以找到它的默认值并对其进行注释。感谢您添加堆栈跟踪。不幸的是,您没有包含引发异常的代码,因此我无法为您完全跟踪它。相反,我进行了一些搜索,为您发现了一些东西
您的堆栈跟踪与另一个堆栈跟踪相似,因此问题如下:
那个人通过注释掉job.setNumReduceTasks(0);
有一个类似的SO问题存在相同的异常,但无法通过这种方式解决问题。相反,它在注释方面存在问题:
下面是一些很好的例子,说明了如何在setNumReduceTasks为0和1或更多的情况下编写工作代码
“51.2.HBase MapReduce读/写示例
下面是一个将HBase用作MapReduce的源和接收器的示例。此示例将简单地将数据从一个表复制到另一个表
Configuration config = HBaseConfiguration.create();
Job job = new Job(config,"ExampleReadWrite");
job.setJarByClass(MyReadWriteJob.class); // class that contains mapper
Scan scan = new Scan();
scan.setCaching(500); // 1 is the default in Scan, which will be bad for MapReduce jobs
scan.setCacheBlocks(false); // don't set to true for MR jobs
// set other scan attrs
TableMapReduceUtil.initTableMapperJob(
sourceTable, // input table
scan, // Scan instance to control CF and attribute selection
MyMapper.class, // mapper class
null, // mapper output key
null, // mapper output value
job);
TableMapReduceUtil.initTableReducerJob(
targetTable, // output table
null, // reducer class
job);
job.setNumReduceTasks(0);
boolean b = job.waitForCompletion(true);
if (!b) {
throw new IOException("error with job!");
}
这是一个或多个示例:
“51.4.HBase MapReduce摘要到HBase示例
以下示例使用HBase作为MapReduce源和汇,并执行摘要步骤。此示例将统计表中某个值的不同实例数,并将这些摘要计数写入另一个表中
Configuration config = HBaseConfiguration.create();
Job job = new Job(config,"ExampleSummary");
job.setJarByClass(MySummaryJob.class); // class that contains mapper and reducer
Scan scan = new Scan();
scan.setCaching(500); // 1 is the default in Scan, which will be bad for MapReduce jobs
scan.setCacheBlocks(false); // don't set to true for MR jobs
// set other scan attrs
TableMapReduceUtil.initTableMapperJob(
sourceTable, // input table
scan, // Scan instance to control CF and attribute selection
MyMapper.class, // mapper class
Text.class, // mapper output key
IntWritable.class, // mapper output value
job);
TableMapReduceUtil.initTableReducerJob(
targetTable, // output table
MyTableReducer.class, // reducer class
job);
job.setNumReduceTasks(1); // at least one, adjust as required
boolean b = job.waitForCompletion(true);
if (!b) {
throw new IOException("error with job!");
}
您似乎更接近第一个示例。我想说明,有时将reduce任务数设置为零是有原因的。让我们看看您提供的代码中的堆栈跟踪:context.write(new ImmutableBytesWritable(rowKey),put)
它在映射方法之外。请先修复它,因为它与回溯显示的内容不匹配…感谢您指出Ruben,这是复制/粘贴错误添加了堆栈跟踪