Hadoop Sqoop导入到HBase失败,数据集太大

Hadoop Sqoop导入到HBase失败,数据集太大,hadoop,hbase,sqoop,Hadoop,Hbase,Sqoop,我在导入到HBase时遇到了问题,使用一个带有sqoop的大型数据集,大约有500万条记录。mapreduce作业开始,但大约30%后停止。然后返回以下错误消息 我环顾四周,发现了这一点,并通过添加import-D、mapred.task.timeout=0和-m来调整我的命令,只是为了尝试一下,但最终结果是一样的,尽管它现在仍然停止在90% sqoop导入命令如下所示。我是否遗漏了任何参数,或者是否需要向hbase站点或zoo.cfg配置文件中添加某些内容 > ./sqoop impor

我在导入到HBase时遇到了问题,使用一个带有sqoop的大型数据集,大约有500万条记录。mapreduce作业开始,但大约30%后停止。然后返回以下错误消息

我环顾四周,发现了这一点,并通过添加import-D、mapred.task.timeout=0和-m来调整我的命令,只是为了尝试一下,但最终结果是一样的,尽管它现在仍然停止在90%

sqoop导入命令如下所示。我是否遗漏了任何参数,或者是否需要向hbase站点或zoo.cfg配置文件中添加某些内容

> ./sqoop import --connect  import -D mapred.task.timeout=0 'jdbc:sqlserver://192.168.4.1:1433;database=dbname;user=sa;password=password' --table user --hbase-table newtable --column-family cf1 --hbase-row-key id --hbase-create-table --split-by id -m 14

    13/10/24 15:06:29 INFO zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session
13/10/24 15:06:29 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 3388@cloudera
13/10/24 15:06:29 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x141e977a64e0004, negotiated timeout = 40000
13/10/24 15:06:29 INFO zookeeper.ClientCnxn: EventThread shut down
13/10/24 15:06:29 INFO zookeeper.ZooKeeper: Session: 0x141e977a64e0004 closed
13/10/24 15:06:29 INFO mapreduce.HBaseImportJob: Creating missing HBase table ai
13/10/24 15:06:30 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=180000 watcher=catalogtracker-on-org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@11d1284a
13/10/24 15:06:30 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
13/10/24 15:06:30 INFO zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session
13/10/24 15:06:30 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 3388@cloudera
13/10/24 15:06:30 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x141e977a64e0005, negotiated timeout = 40000
13/10/24 15:06:30 INFO zookeeper.ZooKeeper: Session: 0x141e977a64e0005 closed
13/10/24 15:06:30 INFO zookeeper.ClientCnxn: EventThread shut down
13/10/24 15:06:31 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN([AIIDX]), MAX([AIIDX]) FROM [ai_view]
13/10/24 15:06:32 INFO mapred.JobClient: Running job: job_201310241455_0001
13/10/24 15:06:33 INFO mapred.JobClient:  map 0% reduce 0%
13/10/24 15:08:24 INFO mapred.JobClient:  map 7% reduce 0%
13/10/24 15:08:50 INFO mapred.JobClient:  map 14% reduce 0%
13/10/24 15:10:11 INFO mapred.JobClient:  map 21% reduce 0%
13/10/24 15:10:51 INFO mapred.JobClient:  map 28% reduce 0%
13/10/24 15:12:16 INFO mapred.JobClient:  map 35% reduce 0%
13/10/24 15:12:57 INFO mapred.JobClient:  map 42% reduce 0%
13/10/24 15:14:12 INFO mapred.JobClient:  map 50% reduce 0%
13/10/24 15:14:55 INFO mapred.JobClient:  map 57% reduce 0%
13/10/24 15:16:35 INFO mapred.JobClient:  map 64% reduce 0%
13/10/24 15:17:28 INFO mapred.JobClient:  map 71% reduce 0%
13/10/24 15:18:42 INFO mapred.JobClient:  map 78% reduce 0%
13/10/24 15:19:24 INFO mapred.JobClient:  map 85% reduce 0%
13/10/24 15:20:44 INFO mapred.JobClient:  map 92% reduce 0%
13/10/24 16:28:28 INFO mapred.JobClient: Task Id : attempt_201310241455_0001_m_000013_0, Status : FAILED
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:708)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:511)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:481)
    at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupConnection(HBaseClient.java:390)
    at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:436)
    at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1133)
    at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:980)
    at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86)
    at com.sun.proxy.$Proxy7.getClosestRowBefore(Unknown Source)
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1137)
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1000)
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:975)
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1214)
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1004)
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:961)
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1678)
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1563)
    at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:990)
    at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:846)
    at org.apache.hadoop.hbase.client.HTable.put(HTable.java:822)
    at org.apache.sqoop.hbase.HBasePutProcessor.accept(HBasePutProcessor.java:150)
    at org.apache.sqoop.mapreduce.DelegatingOutputFormat$DelegatingRecordWriter.write(DelegatingOutputFormat.java:128)
    at org.apache.sqoop.mapreduce.DelegatingOutputFormat$DelegatingRecordWriter.write(DelegatingOutputFormat.java:92)
    at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
    at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
    at org.apache.sqoop.mapreduce.HBaseImportMapper.map(HBaseImportMapper.java:38)
    at org.apache.sqoop.mapreduce.HBaseImportMapper.map(HBaseImportMapper.java:31)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
    at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)

13/10/24 16:33:12 INFO mapred.JobClient: Task Id : attempt_201310241455_0001_m_000013_1, Status : FAILED
java.lang.RuntimeException: Could not access HBase table ai
    at org.apache.sqoop.hbase.HBasePutProcessor.setConf(HBasePutProcessor.java:122)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.sqoop.mapreduce.DelegatingOutputFormat$DelegatingRecordWriter.<init>(DelegatingOutputFormat.java:107)
    at org.apache.sqoop.mapreduce.DelegatingOutputFormat.getRecordWriter(DelegatingOutputFormat.java:82)
    at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:628)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:753)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find region for ai,,99999999999999 after 14 tries.
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1095)
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1000)
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1102)
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1004)
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:961)
    at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:251)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:155)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:129)
    at org.apache.sqoop.hbase.HBasePutProcessor.setConf(HBasePutProcessor.java:120)
    ... 12 more

13/10/24 16:37:58 INFO mapred.JobClient: Task Id : attempt_201310241455_0001_m_000013_2, Status : FAILED
java.lang.RuntimeException: Could not access HBase table ai
    at org.apache.sqoop.hbase.HBasePutProcessor.setConf(HBasePutProcessor.java:122)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.sqoop.mapreduce.DelegatingOutputFormat$DelegatingRecordWriter.<init>(DelegatingOutputFormat.java:107)
    at org.apache.sqoop.mapreduce.DelegatingOutputFormat.getRecordWriter(DelegatingOutputFormat.java:82)
    at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:628)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:753)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find region for ai,,99999999999999 after 14 tries.
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1095)
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1000)
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1102)
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1004)
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:961)
    at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:251)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:155)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:129)
    at org.apache.sqoop.hbase.HBasePutProcessor.setConf(HBasePutProcessor.java:120)
    ... 12 more

13/10/24 16:42:44 INFO mapred.JobClient: Job complete: job_201310241455_0001
13/10/24 16:42:44 INFO mapred.JobClient: Counters: 18
13/10/24 16:42:44 INFO mapred.JobClient:   Job Counters 
13/10/24 16:42:44 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=6610795
13/10/24 16:42:44 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/10/24 16:42:44 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/10/24 16:42:44 INFO mapred.JobClient:     Launched map tasks=17
13/10/24 16:42:44 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
13/10/24 16:42:44 INFO mapred.JobClient:     Failed map tasks=1
13/10/24 16:42:44 INFO mapred.JobClient:   File Output Format Counters 
13/10/24 16:42:44 INFO mapred.JobClient:     Bytes Written=0
13/10/24 16:42:44 INFO mapred.JobClient:   FileSystemCounters
13/10/24 16:42:44 INFO mapred.JobClient:     HDFS_BYTES_READ=1498
13/10/24 16:42:44 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1089897
13/10/24 16:42:44 INFO mapred.JobClient:   File Input Format Counters 
13/10/24 16:42:44 INFO mapred.JobClient:     Bytes Read=0
13/10/24 16:42:44 INFO mapred.JobClient:   Map-Reduce Framework
13/10/24 16:42:44 INFO mapred.JobClient:     Map input records=4782546
13/10/24 16:42:44 INFO mapred.JobClient:     Physical memory (bytes) snapshot=2150453248
13/10/24 16:42:44 INFO mapred.JobClient:     Spilled Records=0
13/10/24 16:42:44 INFO mapred.JobClient:     CPU time spent (ms)=313010
13/10/24 16:42:44 INFO mapred.JobClient:     Total committed heap usage (bytes)=1125842944
13/10/24 16:42:44 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=13256167424
13/10/24 16:42:44 INFO mapred.JobClient:     Map output records=4782546
13/10/24 16:42:44 INFO mapred.JobClient:     SPLIT_RAW_BYTES=1498
13/10/24 16:42:44 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 5,773.4138 seconds (0 bytes/sec)
13/10/24 16:42:44 INFO mapreduce.ImportJobBase: Retrieved 4782546 records.
13/10/24 16:42:44 ERROR tool.ImportTool: Error during import: Import job failed!

如果您认为用户名、密码和端口是正确的,那么您可能必须为sql server安装JDBC驱动程序。Sqoop不附带第三方JDBC驱动程序


您似乎使用了Cloudera,因此,请检查此

您的根问题似乎是java.lang.RuntimeException:无法访问HBase表ai,这表明问题更多地出现在HBase端而不是Sqoop端。我建议查看HBase日志,看看HBase是否运行良好。您好,Jarek,谢谢您的回复。抱歉,只是想澄清一下,发出的命令中的hbase表应该是“ai”而不是“customers”。我只是用不同的表名粘贴了之前的命令。但最终结果仍然是一样的。我在下面的评论中添加了我的hbase日志的输出