Hive 通过减少分区数来更改配置单元表分区

Hive 通过减少分区数来更改配置单元表分区,hive,Hive,创建语句: CREATE EXTERNAL TABLE tab1(usr string) PARTITIONED BY (year string, month string, day string, hour string, min string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' LO

创建语句:

CREATE EXTERNAL TABLE tab1(usr string)  
                PARTITIONED BY (year string, month string, day string, hour string, min string) 
                ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' 
                LOCATION '/tmp/hive1';
select * from tab1;

jhon,2017,2,20,10,11 
jhon,2017,2,20,10,12 
jhon,2017,2,20,10,13
数据:

CREATE EXTERNAL TABLE tab1(usr string)  
                PARTITIONED BY (year string, month string, day string, hour string, min string) 
                ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' 
                LOCATION '/tmp/hive1';
select * from tab1;

jhon,2017,2,20,10,11 
jhon,2017,2,20,10,12 
jhon,2017,2,20,10,13
现在我需要更改
tab1
表,使其只有3个分区
(年字符串、月字符串、日字符串)
,而无需手动复制/修改文件。我有数千个文件,所以我应该只更改表定义而不接触文件


请告诉我怎么做

如果您只需要做一次,我建议您使用预期的分区创建一个新表,并使用动态分区将表从旧表插入到新表中。这也将避免在分区中保留小文件。另一个选项是创建一个新表,该表指向具有预期分区的旧位置,并使用以下属性

TBLPROPERTIES ("hive.input.dir.recursive" = "TRUE", 
"hive.mapred.supports.subdirectories" = "TRUE",
"hive.supports.subdirectories" = "TRUE", 
"mapred.input.dir.recursive" = "TRUE");
之后,您可以运行msck修复表来识别分区