Hadoop 配置单元0.14.0.2.2.4.10-1:多插入-空分区

Hadoop 配置单元0.14.0.2.2.4.10-1:多插入-空分区,hadoop,hive,Hadoop,Hive,我正在尝试使用下面的查询进行多重插入 From kiran.employee_part ep insert overwrite table kiran.employee_ext_part partition (pdept = 'gbm', pspm = 'ajay') select ep.id,ep.name,ep.dept,ep.skill,ep.sal,ep.mgr,ep.spm,ep.comment where ep.pdept = 'gbm' and ep.pspm = 'ajay'

我正在尝试使用下面的查询进行多重插入

From kiran.employee_part ep
insert overwrite table kiran.employee_ext_part
partition (pdept = 'gbm', pspm = 'ajay')
select ep.id,ep.name,ep.dept,ep.skill,ep.sal,ep.mgr,ep.spm,ep.comment  where ep.pdept = 'gbm' and ep.pspm = 'ajay'
insert overwrite table kiran.employee_ext_part
partition (pdept='rw' , pspm='prashanth')
select ep.id,ep.name,ep.dept,ep.skill,ep.sal,ep.mgr,ep.spm,ep.comment where ep.pdept='rw' and ep.pspm='prashanth'
insert overwrite table kiran.employee_ext_part
partition (pdept='test' , pspm='test')
select ep.id,ep.name,ep.dept,ep.skill,ep.sal,ep.mgr,ep.spm,ep.comment where ep.pdept='test' and ep.pspm='test'
insert overwrite table kiran.employee_ext_part partition (pdept='test1' , pspm='test1')
select ep.id,ep.name,ep.dept,ep.skill,ep.sal,ep.mgr,ep.spm,ep.comment where ep.pdept='test1' and ep.pspm='test1';
选择ep.id、ep.name、ep.dept、ep.skill、ep.sal、ep.mgr、ep.spm、ep.comment,其中ep.pdept='test1'和ep.pspm='test1'查询不会按预期返回任何行。其余的select查询只运行几行。执行上述查询后,我的整个kiran.employee_ext_part表变为空,如下所示

hive> select * from employee_ext_part;
OK
employee_ext_part.id    employee_ext_part.name  employee_ext_part.dept  employee_ext_part.skill employee_ext_part.sal   employee_ext_part.mgr   employee_ext_part.spm   employee_ext_part.comment       employee_ext_part.pdept        employee_ext_part.pspm
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    gbm     ajay
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    rw      prashanth
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    test    test
Time taken: 8.116 seconds, Fetched: 3 row(s)
如果我注释掉最后一个查询并执行它,表中就会填充相应的值

From kiran.employee_part ep
insert overwrite table kiran.employee_ext_part
partition (pdept = 'gbm', pspm = 'ajay')
select ep.id,ep.name,ep.dept,ep.skill,ep.sal,ep.mgr,ep.spm,ep.comment  where ep.pdept = 'gbm' and ep.pspm = 'ajay'
insert overwrite table kiran.employee_ext_part
partition (pdept='rw' , pspm='prashanth')
select ep.id,ep.name,ep.dept,ep.skill,ep.sal,ep.mgr,ep.spm,ep.comment where ep.pdept='rw' and ep.pspm='prashanth'
insert overwrite table kiran.employee_ext_part
partition (pdept='test' , pspm='test')
select ep.id,ep.name,ep.dept,ep.skill,ep.sal,ep.mgr,ep.spm,ep.comment where ep.pdept='test' and ep.pspm='test'
--insert overwrite table kiran.employee_ext_part
--partition (pdept='test1' , pspm='test1')
--select ep.id,ep.name,ep.dept,ep.skill,ep.sal,ep.mgr,ep.spm,ep.comment where ep.pdept='test1' and ep.pspm='test1'
;

hive> select * from employee_ext_part;
OK
employee_ext_part.id    employee_ext_part.name  employee_ext_part.dept  employee_ext_part.skill employee_ext_part.sal   employee_ext_part.mgr   employee_ext_part.spm   employee_ext_part.comment       employee_ext_part.pdept        employee_ext_part.pspm
11    devillers gbm     plsql   1000.0  brijesh ajay            NULL    gbm     ajay
12      fafdu   gbm     plsql   5000.0  kiran   ajay            NULL    gbm     ajay
13      steyn   gbm     ba      10000.0 sudeep  ajay            NULL    gbm     ajay
18      duminy  gbm     hr     100001.0 smith   ajay            NULL    gbm     ajay
15      albe    rw      testing 100.0   venu    prashanth       NULL    rw      prashanth
19      miller  rw      testing 1000.0  ram     prashanth       NULL    rw      prashanth
20      pointin rw      testing 8989.0  ram     prashanth       NULL    rw      prashanth
21      rhodes  rw      tesging 9090.0  ram     prashanth       NULL    rw      prashanth
15      albe    rw      testing 100.0   venu    prashanth       NULL    test    test
19      miller  rw      testing 1000.0  ram     prashanth       NULL    test    test
20      pointin rw      testing 8989.0  ram     prashanth       NULL    test    test
21      rhodes  rw      tesging 9090.0  ram     prashanth       NULL    test    test
Time taken: 0.295 seconds, Fetched: 12 row(s)
谁能告诉我出了什么问题吗?当我们有一个查询在多重插入中返回NULL时,它应该是这样工作的,还是我遗漏了什么


顺便说一句-很抱歉标题。无法正确对齐。

免责声明:我不太喜欢多表插入,尤其是在多表都是相同表但分区不同的情况下

如果您无法修复脚本,为什么不尝试更直接的方法,例如

set hive.exec.dynamic.partition.mode =nonstrict ;

insert overwrite into table KIRAN.EMPLOYEE_EXT_PART
partition (PDEPT, PSPM)
select ID, NAME, DEPT, SKILL, SAL, MGR, SPM, COMMENT,
       PDEPT, PSPM
from KIRAN.EMPLOYEE_PART
--where ....
;

注释#1:第一个脚本在插入覆盖时突然停止,之后没有选择(?!?),但第二个脚本显示已注释掉的选择。请确保显示已运行的实际脚本。//评论#2:您使用TEZ还是MapReduce?//评论#3:桌子的结构如何-文本、AVRO、序列、ORC、拼花地板?还有列类型-字符串全部?很抱歉给您添麻烦。编辑了这篇文章。使用MR,表结构是文本。我很清楚这种方法,但我只是想理解为什么为空值。理想情况下不应该这样,对吗?每个Hive版本中都有数百个bug,有些与特定的SERDE相关,有些与特定的功能相关,有些与特定的执行引擎(例如TEZ)相关,有些与特定的配置属性集相关,有些与多个因素的组合相关。我认为,试图“理解”一个可能特定于您当前设置的问题是没有意义的,除非(a)您必须找到一个解决方法,或者(b)您有能力调试这个该死的东西并为Apache提供错误修复。是的,每个版本都有许多错误。关键是,如果某件事情没有像预期的那样工作,那么在确认它是工具本身的一个bug之前,应该确保实现没有错误。这就是我想做的。