Hive 有没有办法用已经存在的分区中的数据填充新添加的列？_Hive

Hive 有没有办法用已经存在的分区中的数据填充新添加的列？

hive

Hive 有没有办法用已经存在的分区中的数据填充新添加的列？,hive,Hive,我正在使用配置单元的动态分区，我遇到了一个问题，即除非添加新分区，否则列中没有填充数据。我创建了一个小的演示示例来演示 1. create table vegetables (name string, count bigint) partitioned by (year int, month int, day int); 2. create external table dataForVeg (name string, count bigint, weight string, year int

我正在使用配置单元的动态分区，我遇到了一个问题，即除非添加新分区，否则列中没有填充数据。我创建了一个小的演示示例来演示

1. create table vegetables (name string, count bigint) partitioned by (year int, month int, day int); 2. create external table dataForVeg (name string, count bigint, weight string, year int, month int, day int) row format delimited fields terminated by ' '; 3. load data1 into dataforveg 4. set hive.exec.dynamic.partition.mode=nonstrict; 5. insert into table vegetables partition(year, month, day) select name, count, year, month, day from dataforveg; 6. hive> select * from vegetables where day='5'; tomato 5 2013 11 5 cabbage 3 2013 11 5 7. hive> alter table vegetables add columns(weight double); 8. hive> describe vegetables ; name string count bigint weight double year int month int day int 9. hive> select * from vegetables where day='5'; tomato 5 NULL 2013 11 5 cabbage 3 NULL 2013 11 5 hive> select * from vegetables where day='4'; potato 2 NULL 2013 11 4 10. load overwrite data2 into dataforveg 11. hive> select * from dataforveg; carrot 10 5 2013 11 5 pepper 15 2 2013 11 5 12. hive> select * from vegetables where day='5'; tomato 5 NULL 2013 11 5 cabbage 3 NULL 2013 11 5 carrot 10 NULL 2013 11 5 pepper 15 NULL 2013 11 5 13. load overwrite data3 into dataforveg hive> select * from dataforveg; beet 4 1 2013 11 6 broccoli 3 1 2013 11 6 14. hive> select * from vegetables; potato 2 NULL 2013 11 4 tomato 5 NULL 2013 11 5 cabbage 3 NULL 2013 11 5 carrot 10 NULL 2013 11 5 pepper 15 NULL 2013 11 5 beet 4 1.0 2013 11 6 broccoli 3 1.0 2013 11 6 1.创建以（年整数、月整数、日整数）分区的表（名称字符串、计数bigint）； 2.创建外部表dataForVeg（名称字符串、计数bigint、权重字符串、年份int、月份int、日期int）行格式分隔字段，以“”结尾； 3.将数据1加载到dataforveg中 4.设置hive.exec.dynamic.partition.mode=nonstrict； 5.在表中插入蔬菜分区（年、月、日），从dataforveg中选择名称、计数、年、月、日； 6.蜂巢>从蔬菜中选择*，其中day='5'；番茄5 2013 11 5 甘蓝3 2013 11 5 7.蜂箱>改变表格蔬菜添加列（重量加倍）； 8.蜂巢>描述蔬菜；名称字符串比金伯爵双倍重量整年月整数日整数 9蜂巢>从蔬菜中选择*，其中day='5'；番茄5号2013 11 5 甘蓝3号2013 11 5 蜂巢>从蔬菜中选择*，其中day='4'；马铃薯2号2013 11 4 10将覆盖数据2加载到dataforveg中 11蜂巢>从dataforveg中选择*；胡萝卜10 5 2013 11 5 胡椒15 2 2013 11 5 12蜂巢>从蔬菜中选择*，其中day='5'；番茄5号2013 11 5 甘蓝3号2013 11 5 胡萝卜10空2013 11 5 胡椒15空2013 11 5 13将覆盖数据3加载到dataforveg中蜂巢>从dataforveg中选择*；甜菜4 1 2013 11 6 西兰花312013116 14蜂巢>从蔬菜中选择*；马铃薯2号2013 11 4 番茄5号2013 11 5 甘蓝3号2013 11 5 胡萝卜10空2013 11 5 胡椒15空2013 11 5 甜菜4 1.0 2013 11 6 西兰花31.0 2013 11 6 从示例中可以看到，当您添加新分区时，数据正在更新。问题：是否有方法刷新步骤12中胡萝卜和胡椒的新字段“权重”的值？

换句话说，有没有办法用现有分区中的数据填充新添加的列？

底层文件系统HDFS不支持更新甚至追加文件。在这种情况下，您唯一的选择是创建一个MapReduce作业，该作业将旧分区中的数据与新列的值合并，然后用该MapReduce作业的输出替换该分区中的文件

如果您不喜欢编写MapReduce作业，您可能可以将Hive CTA（创建表为select）和HDFS操作组合在一起进行装配。

谢谢您的回复，我会尝试的。