Hadoop 在配置单元中创建bucket文件时_Hadoop_Mapreduce_Hive_Hiveql

Hadoop 在配置单元中创建bucket文件时

hadoop mapreduce hive

Hadoop 在配置单元中创建bucket文件时,hadoop,mapreduce,hive,hiveql,Hadoop,Mapreduce,Hive,Hiveql,在Bucketing中，bucket文件是在配置单元的哪个阶段创建的 create table emp( id int, name string, country string) clustered by( country) INTO 2 BUCKETS row format delimited fields terminated by ',' stored as textfile ; 如果我有20个bucket，只有4行，那么将创建多少文件？创建表时将创建bucket。它们将作为表目录中的

在Bucketing中，bucket文件是在配置单元的哪个阶段创建的

create table emp( id int, name string, country string)
 clustered by( country)
INTO 2 BUCKETS
row format delimited
fields terminated by ','
stored as textfile ;

如果我有20个bucket，只有4行，那么将创建多少文件？

创建表时将创建bucket。它们将作为表目录中的独立文件位于配置单元仓库中。一旦在bucket表中插入新记录，配置单元将计算bucket列值的散列，并获取bucket文件的指针。对于您的20个bucket，您将在开始时有20个空文件，但4条记录的确切位置将取决于hash函数对bucket列值的结果：

**record.country.value => hashfunction(record.country.value) = bucketNumber**

您可以按照本节Bucket table部分中描述的步骤复制它