Hadoop 配置单元0.14.0.2.2.4.10-1:多插入-空分区
我正在尝试使用下面的查询进行多重插入Hadoop 配置单元0.14.0.2.2.4.10-1:多插入-空分区,hadoop,hive,Hadoop,Hive,我正在尝试使用下面的查询进行多重插入 From kiran.employee_part ep insert overwrite table kiran.employee_ext_part partition (pdept = 'gbm', pspm = 'ajay') select ep.id,ep.name,ep.dept,ep.skill,ep.sal,ep.mgr,ep.spm,ep.comment where ep.pdept = 'gbm' and ep.pspm = 'ajay'
From kiran.employee_part ep
insert overwrite table kiran.employee_ext_part
partition (pdept = 'gbm', pspm = 'ajay')
select ep.id,ep.name,ep.dept,ep.skill,ep.sal,ep.mgr,ep.spm,ep.comment where ep.pdept = 'gbm' and ep.pspm = 'ajay'
insert overwrite table kiran.employee_ext_part
partition (pdept='rw' , pspm='prashanth')
select ep.id,ep.name,ep.dept,ep.skill,ep.sal,ep.mgr,ep.spm,ep.comment where ep.pdept='rw' and ep.pspm='prashanth'
insert overwrite table kiran.employee_ext_part
partition (pdept='test' , pspm='test')
select ep.id,ep.name,ep.dept,ep.skill,ep.sal,ep.mgr,ep.spm,ep.comment where ep.pdept='test' and ep.pspm='test'
insert overwrite table kiran.employee_ext_part partition (pdept='test1' , pspm='test1')
select ep.id,ep.name,ep.dept,ep.skill,ep.sal,ep.mgr,ep.spm,ep.comment where ep.pdept='test1' and ep.pspm='test1';
选择ep.id、ep.name、ep.dept、ep.skill、ep.sal、ep.mgr、ep.spm、ep.comment,其中ep.pdept='test1'和ep.pspm='test1'查询不会按预期返回任何行。其余的select查询只运行几行。执行上述查询后,我的整个kiran.employee_ext_part表变为空,如下所示
hive> select * from employee_ext_part;
OK
employee_ext_part.id employee_ext_part.name employee_ext_part.dept employee_ext_part.skill employee_ext_part.sal employee_ext_part.mgr employee_ext_part.spm employee_ext_part.comment employee_ext_part.pdept employee_ext_part.pspm
NULL NULL NULL NULL NULL NULL NULL NULL gbm ajay
NULL NULL NULL NULL NULL NULL NULL NULL rw prashanth
NULL NULL NULL NULL NULL NULL NULL NULL test test
Time taken: 8.116 seconds, Fetched: 3 row(s)
如果我注释掉最后一个查询并执行它,表中就会填充相应的值
From kiran.employee_part ep
insert overwrite table kiran.employee_ext_part
partition (pdept = 'gbm', pspm = 'ajay')
select ep.id,ep.name,ep.dept,ep.skill,ep.sal,ep.mgr,ep.spm,ep.comment where ep.pdept = 'gbm' and ep.pspm = 'ajay'
insert overwrite table kiran.employee_ext_part
partition (pdept='rw' , pspm='prashanth')
select ep.id,ep.name,ep.dept,ep.skill,ep.sal,ep.mgr,ep.spm,ep.comment where ep.pdept='rw' and ep.pspm='prashanth'
insert overwrite table kiran.employee_ext_part
partition (pdept='test' , pspm='test')
select ep.id,ep.name,ep.dept,ep.skill,ep.sal,ep.mgr,ep.spm,ep.comment where ep.pdept='test' and ep.pspm='test'
--insert overwrite table kiran.employee_ext_part
--partition (pdept='test1' , pspm='test1')
--select ep.id,ep.name,ep.dept,ep.skill,ep.sal,ep.mgr,ep.spm,ep.comment where ep.pdept='test1' and ep.pspm='test1'
;
hive> select * from employee_ext_part;
OK
employee_ext_part.id employee_ext_part.name employee_ext_part.dept employee_ext_part.skill employee_ext_part.sal employee_ext_part.mgr employee_ext_part.spm employee_ext_part.comment employee_ext_part.pdept employee_ext_part.pspm
11 devillers gbm plsql 1000.0 brijesh ajay NULL gbm ajay
12 fafdu gbm plsql 5000.0 kiran ajay NULL gbm ajay
13 steyn gbm ba 10000.0 sudeep ajay NULL gbm ajay
18 duminy gbm hr 100001.0 smith ajay NULL gbm ajay
15 albe rw testing 100.0 venu prashanth NULL rw prashanth
19 miller rw testing 1000.0 ram prashanth NULL rw prashanth
20 pointin rw testing 8989.0 ram prashanth NULL rw prashanth
21 rhodes rw tesging 9090.0 ram prashanth NULL rw prashanth
15 albe rw testing 100.0 venu prashanth NULL test test
19 miller rw testing 1000.0 ram prashanth NULL test test
20 pointin rw testing 8989.0 ram prashanth NULL test test
21 rhodes rw tesging 9090.0 ram prashanth NULL test test
Time taken: 0.295 seconds, Fetched: 12 row(s)
谁能告诉我出了什么问题吗?当我们有一个查询在多重插入中返回NULL时,它应该是这样工作的,还是我遗漏了什么
顺便说一句-很抱歉标题。无法正确对齐。免责声明:我不太喜欢多表插入,尤其是在多表都是相同表但分区不同的情况下 如果您无法修复脚本,为什么不尝试更直接的方法,例如
set hive.exec.dynamic.partition.mode =nonstrict ;
insert overwrite into table KIRAN.EMPLOYEE_EXT_PART
partition (PDEPT, PSPM)
select ID, NAME, DEPT, SKILL, SAL, MGR, SPM, COMMENT,
PDEPT, PSPM
from KIRAN.EMPLOYEE_PART
--where ....
;
注释#1:第一个脚本在插入覆盖时突然停止,之后没有选择(?!?),但第二个脚本显示已注释掉的选择。请确保显示已运行的实际脚本。//评论#2:您使用TEZ还是MapReduce?//评论#3:桌子的结构如何-文本、AVRO、序列、ORC、拼花地板?还有列类型-字符串全部?很抱歉给您添麻烦。编辑了这篇文章。使用MR,表结构是文本。我很清楚这种方法,但我只是想理解为什么为空值。理想情况下不应该这样,对吗?每个Hive版本中都有数百个bug,有些与特定的SERDE相关,有些与特定的功能相关,有些与特定的执行引擎(例如TEZ)相关,有些与特定的配置属性集相关,有些与多个因素的组合相关。我认为,试图“理解”一个可能特定于您当前设置的问题是没有意义的,除非(a)您必须找到一个解决方法,或者(b)您有能力调试这个该死的东西并为Apache提供错误修复。是的,每个版本都有许多错误。关键是,如果某件事情没有像预期的那样工作,那么在确认它是工具本身的一个bug之前,应该确保实现没有错误。这就是我想做的。