Hadoop 具有动态分区的配置单元插入查询
我有一个orc格式的分区和集群配置单元表,我必须从临时表中插入数据 创建表语句: orc表:Hadoop 具有动态分区的配置单元插入查询,hadoop,hive,orc,Hadoop,Hive,Orc,我有一个orc格式的分区和集群配置单元表,我必须从临时表中插入数据 创建表语句: orc表: CREATE EXTERNAL TABLE DYN_TGT_TABLE(ID INT,USRNAME VARCHAR(20), LD TIMESTAMP) PARTITIONED BY (PART VARCHAR(20)) CLUSTERED BY (id) INTO 1024 BUCKETS ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcS
CREATE EXTERNAL TABLE DYN_TGT_TABLE(ID INT,USRNAME VARCHAR(20), LD TIMESTAMP) PARTITIONED BY (PART VARCHAR(20)) CLUSTERED BY (id) INTO 1024 BUCKETS ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' LOCATION '/tmp/dyn_tgt_table' TBLPROPERTIES ('transactional'='true','transient_lastDdlTime'='1461157247');
temp table:
CREATE EXTERNAL TABLE DYN_TEMP_TABLE(ID INT,USRNAME VARCHAR(20),PART VARCHAR(20), LD TIMESTAMP) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE LOCATION '/tmp/dyn_temp_table' TBLPROPERTIES ('transactional'='TRUE','transient_lastDdlTime'='1461216088');
在临时表中添加了5条记录:
蜂巢>从动态温度表中选择*代码>
嗯
所用时间:0.166秒,获取:5行
以下动态插入查询出错:
insert into TABLE DYN_TGT_TABLE PARTITION(part) select id,usrname,ld,part from DYN_TEMP_TABLE;
错误消息:
此任务的诊断消息:
错误:java.lang.RuntimeException:org.apache.hadoop.hive.ql.metadata.HiveException:hive运行时处理行时出错(标记=0)
请帮助我指出此错误的原因。请尝试插入覆盖表…对于实现AcidOutputFormat的OutputFormat,不允许插入覆盖表。所以它对ORC表不起作用。我可以通过将存储桶的数量从1024个减少到256个(512个也起作用)来解决这个问题。但我没有找到问题的根源。其他信息:当存储桶设置为1024时,我的群集上只启动了512个还原程序。请尝试插入覆盖表…实现AcidOutputFormat的OutputFormat不允许插入覆盖表。所以它对ORC表不起作用。我可以通过将存储桶的数量从1024个减少到256个(512个也起作用)来解决这个问题。但我没有找到问题的根源。附加信息:当bucket设置为1024时,我的集群上只启动了512个reducer。
insert into TABLE DYN_TGT_TABLE PARTITION(part) select id,usrname,ld,part from DYN_TEMP_TABLE;
{"key":{},"value":{"_col0":15,"_col1":"user4","_col2":"2016-05-06 06:31:48","_col3":"B"}}
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:265)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{},"value":{"_col0":15,"_col1":"user4","_col2":"2016-05-06 06:31:48","_col3":"B"}}
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:253)
... 7 more
Caused by: java.lang.NullPointerException
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.findWriterOffset(FileSinkOperator.java:755)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:683)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45)
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244)
... 7 more
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask