File 蜂巢分割和扣合_File_Hive

File 蜂巢分割和扣合

file hive

File 蜂巢分割和扣合,file,hive,File,Hive,我是新来的蜂巢人，想把桌子从一张平桌上搬起来。我的平桌如下 create table data(auth string, file string, documents string) row format delimited fields terminated by '\t' ; 我的桶表如下 create table test(auth string, documents string) partitioned by (file string) clustered by(auth) int

我是新来的蜂巢人，想把桌子从一张平桌上搬起来。我的平桌如下

create table data(auth string, file string, documents string)
row format delimited
fields terminated by '\t' ;

我的桶表如下

create table test(auth string, documents string)
partitioned by (file string)
clustered by(auth) into 2 buckets ;

我必须撰写A和B以及他们的10-10份文件，

当我尝试在bucketed table中插入数据时，执行成功，但问题是希望每个作者的所有10个文件都在同一分区中，但我得到一个包含所有10个文件内容的文件。

我假设以下表结构：平板式：

CREATE TABLE flattable (id INT, author STRING, book STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';

可弯曲的：

CREATE TABLE bucketedtable (id INT, book STRING)
partitioned by (author STRING)
CLUSTERED BY (book) INTO 10 BUCKETS;

在配置单元中设置属性：

set hive.enforce.bucketing = true; 
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;

插入可从易燃材料中取出的扣件

INSERT INTO TABLE bucketedtable
PARTITION (author)
SELECT  id, book, author
FROM flattable;

您只需要交换“分区依据”和“群集依据”字段

也就是说，我得到了A和B的两个不同的文件，每个文件都包含关于作者的所有10个文件内容。我想要10个A文件和10个B文件在它们的作者分区中