Hadoop 配置单元0.14.0.2.2.4.10-1：多插入-空分区_Hadoop_Hive

Hadoop 配置单元0.14.0.2.2.4.10-1：多插入-空分区

hadoop hive

Hadoop 配置单元0.14.0.2.2.4.10-1：多插入-空分区,hadoop,hive,Hadoop,Hive,我正在尝试使用下面的查询进行多重插入 From kiran.employee_part ep insert overwrite table kiran.employee_ext_part partition (pdept = 'gbm', pspm = 'ajay') select ep.id,ep.name,ep.dept,ep.skill,ep.sal,ep.mgr,ep.spm,ep.comment where ep.pdept = 'gbm' and ep.pspm = 'ajay'

我正在尝试使用下面的查询进行多重插入

From kiran.employee_part ep
insert overwrite table kiran.employee_ext_part
partition (pdept = 'gbm', pspm = 'ajay')
select ep.id,ep.name,ep.dept,ep.skill,ep.sal,ep.mgr,ep.spm,ep.comment  where ep.pdept = 'gbm' and ep.pspm = 'ajay'
insert overwrite table kiran.employee_ext_part
partition (pdept='rw' , pspm='prashanth')
select ep.id,ep.name,ep.dept,ep.skill,ep.sal,ep.mgr,ep.spm,ep.comment where ep.pdept='rw' and ep.pspm='prashanth'
insert overwrite table kiran.employee_ext_part
partition (pdept='test' , pspm='test')
select ep.id,ep.name,ep.dept,ep.skill,ep.sal,ep.mgr,ep.spm,ep.comment where ep.pdept='test' and ep.pspm='test'
insert overwrite table kiran.employee_ext_part partition (pdept='test1' , pspm='test1')
select ep.id,ep.name,ep.dept,ep.skill,ep.sal,ep.mgr,ep.spm,ep.comment where ep.pdept='test1' and ep.pspm='test1';

选择ep.id、ep.name、ep.dept、ep.skill、ep.sal、ep.mgr、ep.spm、ep.comment，其中ep.pdept='test1'和ep.pspm='test1'查询不会按预期返回任何行。其余的select查询只运行几行。执行上述查询后，我的整个kiran.employee_ext_part表变为空，如下所示

hive> select * from employee_ext_part; OK employee_ext_part.id employee_ext_part.name employee_ext_part.dept employee_ext_part.skill employee_ext_part.sal employee_ext_part.mgr employee_ext_part.spm employee_ext_part.comment employee_ext_part.pdept employee_ext_part.pspm NULL NULL NULL NULL NULL NULL NULL NULL gbm ajay NULL NULL NULL NULL NULL NULL NULL NULL rw prashanth NULL NULL NULL NULL NULL NULL NULL NULL test test Time taken: 8.116 seconds, Fetched: 3 row(s)
如果我注释掉最后一个查询并执行它，表中就会填充相应的值

From kiran.employee_part ep insert overwrite table kiran.employee_ext_part partition (pdept = 'gbm', pspm = 'ajay') select ep.id,ep.name,ep.dept,ep.skill,ep.sal,ep.mgr,ep.spm,ep.comment where ep.pdept = 'gbm' and ep.pspm = 'ajay' insert overwrite table kiran.employee_ext_part partition (pdept='rw' , pspm='prashanth') select ep.id,ep.name,ep.dept,ep.skill,ep.sal,ep.mgr,ep.spm,ep.comment where ep.pdept='rw' and ep.pspm='prashanth' insert overwrite table kiran.employee_ext_part partition (pdept='test' , pspm='test') select ep.id,ep.name,ep.dept,ep.skill,ep.sal,ep.mgr,ep.spm,ep.comment where ep.pdept='test' and ep.pspm='test' --insert overwrite table kiran.employee_ext_part --partition (pdept='test1' , pspm='test1') --select ep.id,ep.name,ep.dept,ep.skill,ep.sal,ep.mgr,ep.spm,ep.comment where ep.pdept='test1' and ep.pspm='test1' ; hive> select * from employee_ext_part; OK employee_ext_part.id employee_ext_part.name employee_ext_part.dept employee_ext_part.skill employee_ext_part.sal employee_ext_part.mgr employee_ext_part.spm employee_ext_part.comment employee_ext_part.pdept employee_ext_part.pspm 11 devillers gbm plsql 1000.0 brijesh ajay NULL gbm ajay 12 fafdu gbm plsql 5000.0 kiran ajay NULL gbm ajay 13 steyn gbm ba 10000.0 sudeep ajay NULL gbm ajay 18 duminy gbm hr 100001.0 smith ajay NULL gbm ajay 15 albe rw testing 100.0 venu prashanth NULL rw prashanth 19 miller rw testing 1000.0 ram prashanth NULL rw prashanth 20 pointin rw testing 8989.0 ram prashanth NULL rw prashanth 21 rhodes rw tesging 9090.0 ram prashanth NULL rw prashanth 15 albe rw testing 100.0 venu prashanth NULL test test 19 miller rw testing 1000.0 ram prashanth NULL test test 20 pointin rw testing 8989.0 ram prashanth NULL test test 21 rhodes rw tesging 9090.0 ram prashanth NULL test test Time taken: 0.295 seconds, Fetched: 12 row(s)
谁能告诉我出了什么问题吗？当我们有一个查询在多重插入中返回NULL时，它应该是这样工作的，还是我遗漏了什么

顺便说一句-很抱歉标题。无法正确对齐。
免责声明：我不太喜欢多表插入，尤其是在多表都是相同表但分区不同的情况下
如果您无法修复脚本，为什么不尝试更直接的方法，例如

set hive.exec.dynamic.partition.mode =nonstrict ; insert overwrite into table KIRAN.EMPLOYEE_EXT_PART partition (PDEPT, PSPM) select ID, NAME, DEPT, SKILL, SAL, MGR, SPM, COMMENT, PDEPT, PSPM from KIRAN.EMPLOYEE_PART --where .... ;

注释#1：第一个脚本在插入覆盖时突然停止，之后没有选择（？！？），但第二个脚本显示已注释掉的选择。请确保显示已运行的实际脚本。//评论#2：您使用TEZ还是MapReduce？//评论#3：桌子的结构如何-文本、AVRO、序列、ORC、拼花地板？还有列类型-字符串全部？很抱歉给您添麻烦。编辑了这篇文章。使用MR，表结构是文本。我很清楚这种方法，但我只是想理解为什么为空值。理想情况下不应该这样，对吗？每个Hive版本中都有数百个bug，有些与特定的SERDE相关，有些与特定的功能相关，有些与特定的执行引擎（例如TEZ）相关，有些与特定的配置属性集相关，有些与多个因素的组合相关。我认为，试图“理解”一个可能特定于您当前设置的问题是没有意义的，除非（a）您必须找到一个解决方法，或者（b）您有能力调试这个该死的东西并为Apache提供错误修复。是的，每个版本都有许多错误。关键是，如果某件事情没有像预期的那样工作，那么在确认它是工具本身的一个bug之前，应该确保实现没有错误。这就是我想做的。