Hadoop 配置单元动态分区,未创建正确的分区

Hadoop 配置单元动态分区,未创建正确的分区,hadoop,dynamic,hive,hdfs,hiveql,Hadoop,Dynamic,Hive,Hdfs,Hiveql,我试图将数据插入到分区表中,但并没有创建所有分区(只创建空值和零值),请参见下文 蜂巢> select state_code,district_code,count(*) from marital_status group by state_code,district_code; Total MapReduce jobs = 1 insert overwrite table marital_status_part partition(DISTRICT_CODE) SELECT * FROM M

我试图将数据插入到分区表中,但并没有创建所有分区(只创建空值和零值),请参见下文

蜂巢>

select state_code,district_code,count(*) from marital_status group by state_code,district_code;
Total MapReduce jobs = 1
insert overwrite table marital_status_part partition(DISTRICT_CODE) SELECT * FROM MARITAL_STATUS WHERE DISTRICT_CODE IN ('532','533','534');
Total MapReduce jobs = 3
Launching Job 1 out of 3
推出MapReduce作业:

...
Job 0: Map: 1  Reduce: 1   Cumulative CPU: 3.49 sec   HDFS Read: 193305 HDFS Write: 240 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 490 msec
OK
28  000 60
28  532 60
28  533 60
28  534 60
28  535 60
28  536 60
28  537 60
28  538 60
28  539 60
28  540 60
28  541 60
28  542 60
28  543 60
28  544 60
28  545 60
28  546 60
28  547 60
28  548 60
28  549 60
28  550 60
28  551 60
28  552 60
28  553 60
28  554 60
Time taken: 39.442 seconds, Fetched: 24 row(s)
我现在将这个表数据插入到另一个分区为district_代码的表中

蜂巢>

select state_code,district_code,count(*) from marital_status group by state_code,district_code;
Total MapReduce jobs = 1
insert overwrite table marital_status_part partition(DISTRICT_CODE) SELECT * FROM MARITAL_STATUS WHERE DISTRICT_CODE IN ('532','533','534');
Total MapReduce jobs = 3
Launching Job 1 out of 3
由于没有reduce运算符,reduce任务数设置为0

Starting Job = job_201507071409_0020, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201507071409_0020
Kill Command = /home/chaitanya/hadoop-1.2.1/libexec/../bin/hadoop job  -kill job_201507071409_0020
阶段1的Hadoop作业信息:映射者数量:1;减速器的数量:

0
2015-07-07 16:35:38,180 Stage-1 map = 0%,  reduce = 0%
2015-07-07 16:35:48,214 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.01 sec
2015-07-07 16:35:49,217 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.01 sec
2015-07-07 16:35:50,220 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.01 sec
2015-07-07 16:35:51,222 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.01 sec
2015-07-07 16:35:52,226 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.01 sec
2015-07-07 16:35:53,234 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.01 sec
2015-07-07 16:35:54,237 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 2.01 sec
MapReduce Total cumulative CPU time: 2 seconds 10 msec
Ended Job = job_201507071409_0020
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://localhost:9000/tmp/hive-chaitanya/hive_2015-07-07_16-35-29_099_2560746659196071718-1/-ext-10000
Loading data to table default.marital_status_part partition (district_code=null)
    Loading partition {district_code=0}
Partition default.marital_status_part{district_code=0} stats: [num_files: 1, num_rows: 0, total_size: 22882, raw_data_size: 0]
Table default.marital_status_part stats: [num_partitions: 1, num_files: 1, num_rows: 0, total_size: 22882, raw_data_size: 0]
MapReduce Jobs Launched: 
Job 0: Map: 1   Cumulative CPU: 2.01 sec   HDFS Read: 193305 HDFS Write: 22882 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 10 msec
OK
Time taken: 26.254 seconds

实际应该发生的是,必须使用532533534创建三个文件夹,但只创建了两个文件夹(NULL&zero)。你能在这方面帮助我吗?

可以将配置单元分区视为“虚拟”列。在HDF上,它们被分成不同的目录。分区值取自select的最后一个条目。在不了解更多关于表列的信息的情况下,如果稍作修改,下面的查询应该可以工作

插入覆盖表婚姻状况部分分区(地区代码)从婚姻状况中选择第1列、第2列、…、第n列、地区代码,其中地区代码位于('532'、'533'、'534')

在此插入中,请注意地区代码是选择部分的最后一列。最后一列将用作分区(地区代码)
中的地区代码。您需要确保所选择的列数与目标表中的列数匹配,并包含要分区的内容


有关详细信息,请参阅

您是否执行了以下命令

设置hive.exec.dynamic.partition=true

设置hive.exec.dynamic.partition.mode=nonstrict

这是因为默认情况下启用了静态分区,这可能会造成您所面临的问题

(无法格式化以上文本,因为我正在使用手机回答此问题)