Hadoop 具有动态分区的配置单元插入查询

Hadoop 具有动态分区的配置单元插入查询,hadoop,hive,orc,Hadoop,Hive,Orc,我有一个orc格式的分区和集群配置单元表,我必须从临时表中插入数据 创建表语句: orc表: CREATE EXTERNAL TABLE DYN_TGT_TABLE(ID INT,USRNAME VARCHAR(20), LD TIMESTAMP) PARTITIONED BY (PART VARCHAR(20)) CLUSTERED BY (id) INTO 1024 BUCKETS ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcS

我有一个orc格式的分区和集群配置单元表,我必须从临时表中插入数据

创建表语句: orc表:

CREATE EXTERNAL TABLE DYN_TGT_TABLE(ID INT,USRNAME VARCHAR(20), LD TIMESTAMP) PARTITIONED BY (PART VARCHAR(20)) CLUSTERED BY (id) INTO 1024 BUCKETS ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' LOCATION '/tmp/dyn_tgt_table' TBLPROPERTIES ('transactional'='true','transient_lastDdlTime'='1461157247');

temp table:
CREATE EXTERNAL TABLE DYN_TEMP_TABLE(ID INT,USRNAME VARCHAR(20),PART VARCHAR(20), LD TIMESTAMP) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE LOCATION '/tmp/dyn_temp_table' TBLPROPERTIES ('transactional'='TRUE','transient_lastDdlTime'='1461216088');
在临时表中添加了5条记录: 蜂巢>
从动态温度表中选择*

所用时间:0.166秒,获取:5行

以下动态插入查询出错:

insert into TABLE DYN_TGT_TABLE PARTITION(part) select id,usrname,ld,part from DYN_TEMP_TABLE;
错误消息: 此任务的诊断消息: 错误:java.lang.RuntimeException:org.apache.hadoop.hive.ql.metadata.HiveException:hive运行时处理行时出错(标记=0)


请帮助我指出此错误的原因。

请尝试插入覆盖表…对于实现AcidOutputFormat的OutputFormat,不允许插入覆盖表。所以它对ORC表不起作用。我可以通过将存储桶的数量从1024个减少到256个(512个也起作用)来解决这个问题。但我没有找到问题的根源。其他信息:当存储桶设置为1024时,我的群集上只启动了512个还原程序。请尝试插入覆盖表…实现AcidOutputFormat的OutputFormat不允许插入覆盖表。所以它对ORC表不起作用。我可以通过将存储桶的数量从1024个减少到256个(512个也起作用)来解决这个问题。但我没有找到问题的根源。附加信息:当bucket设置为1024时,我的集群上只启动了512个reducer。
insert into TABLE DYN_TGT_TABLE PARTITION(part) select id,usrname,ld,part from DYN_TEMP_TABLE;
{"key":{},"value":{"_col0":15,"_col1":"user4","_col2":"2016-05-06 06:31:48","_col3":"B"}}
        at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:265)
        at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{},"value":{"_col0":15,"_col1":"user4","_col2":"2016-05-06 06:31:48","_col3":"B"}}
        at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:253)
        ... 7 more
Caused by: java.lang.NullPointerException
        at org.apache.hadoop.hive.ql.exec.FileSinkOperator.findWriterOffset(FileSinkOperator.java:755)
        at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:683)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
        at org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45)
        at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244)
        ... 7 more


FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask