Hive 通过减少分区数来更改配置单元表分区
创建语句:Hive 通过减少分区数来更改配置单元表分区,hive,Hive,创建语句: CREATE EXTERNAL TABLE tab1(usr string) PARTITIONED BY (year string, month string, day string, hour string, min string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' LO
CREATE EXTERNAL TABLE tab1(usr string)
PARTITIONED BY (year string, month string, day string, hour string, min string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'
LOCATION '/tmp/hive1';
select * from tab1;
jhon,2017,2,20,10,11
jhon,2017,2,20,10,12
jhon,2017,2,20,10,13
数据:
CREATE EXTERNAL TABLE tab1(usr string)
PARTITIONED BY (year string, month string, day string, hour string, min string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'
LOCATION '/tmp/hive1';
select * from tab1;
jhon,2017,2,20,10,11
jhon,2017,2,20,10,12
jhon,2017,2,20,10,13
现在我需要更改tab1
表,使其只有3个分区(年字符串、月字符串、日字符串)
,而无需手动复制/修改文件。我有数千个文件,所以我应该只更改表定义而不接触文件
请告诉我怎么做 如果您只需要做一次,我建议您使用预期的分区创建一个新表,并使用动态分区将表从旧表插入到新表中。这也将避免在分区中保留小文件。另一个选项是创建一个新表,该表指向具有预期分区的旧位置,并使用以下属性
TBLPROPERTIES ("hive.input.dir.recursive" = "TRUE",
"hive.mapred.supports.subdirectories" = "TRUE",
"hive.supports.subdirectories" = "TRUE",
"mapred.input.dir.recursive" = "TRUE");
之后,您可以运行msck修复表来识别分区