Hadoop 在'中设置分区位置；插入覆盖'；hive中的动态分区查询_Hadoop_Hive_Cloudera_Hive Partitions

Hadoop 在'中设置分区位置；插入覆盖'；hive中的动态分区查询

hadoop hive

Hadoop 在'中设置分区位置；插入覆盖'；hive中的动态分区查询,hadoop,hive,cloudera,hive-partitions,Hadoop,Hive,Cloudera,Hive Partitions,我创建了一个配置单元表，其基本位置指向AWS S3位置。但是，我想使用“插入覆盖”查询在HDFS集群上创建一个分区步骤如下： -- Create intermediate table create table test_int_ash ( loc string) partitioned by (name string, age int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' stored as textfile location '/us

我创建了一个配置单元表，其基本位置指向AWS S3位置。但是，我想使用“插入覆盖”查询在HDFS集群上创建一个分区

步骤如下：

-- Create intermediate table
create table test_int_ash
( loc string)
partitioned by (name string, age int)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
stored as textfile
location '/user/ash/test_int';

-- Insert into intermedate table with two names 'rash' and 'nash'
INSERT INTO test_int_ash partition (name="rash",age=20) values ('brisbane');
INSERT INTO test_int_ash partition (name="rash",age=30) values ('Sydney');
INSERT INTO test_int_ash partition (name="rash",age=40) values ('Melbourne');
INSERT INTO test_int_ash partition (name="rash",age=50) values ('Perth');

INSERT INTO test_int_ash partition (name="nash",age=50) values ('Auckland');
INSERT INTO test_int_ash partition (name="nash",age=40) values ('Wellington');


-- create curated table
create external table test_curated_ash
( loc string)
partitioned by (name string, age int)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
stored as textfile
location 's3a://mybucket/tmp/test_curated/'; 

-- load curated table from intermedate table, using dynamic partition method, creates partitions on aws s3.
insert overwrite table test_curated_ash partition(name='rash',age)
select loc,age from test_int_ash where name='rash' ;

-- I want to keep this partition on HDFS cluster, below query doesnt work 

insert overwrite table test_curated_ash partition(name='nash',age) location 'hdfs://mynamenode/user/ash/test_curated_new'
select loc,age from test_int_ash where name='nash';

下面的查询可以工作，但我不想用“静态分区”方法处理它

alter table test_curated_ash add partition(name='nash',age=40) location 'hdfs://swmcdh1/user/contexti/ash/test_curated_new/name=nash/age=40';
alter table test_curated_ash add partition(name='nash',age=50) location 'hdfs://swmcdh1/user/contexti/ash/test_curated_new/name=nash/age=50';

insert overwrite table test_curated_ash partition(name='nash',age)
select loc,age from test_int_ash where name='nash'

您能帮助我如何在“插入覆盖”动态查询中设置分区位置吗？

假设我有一个名为“user”的表，并且我想使用country列对其进行动态分区

查询：

set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions=1000;
set hive.exec.max.dynamic.partitions.pernode=1000;

INSERT overwrite TABLE partitioned_user
    PARTITION (country)
        SELECT  firstname ,lastname,address,city,salary ,post,phone1,phone2,email,
        web,country FROM user;

将数据插入分区时，必须将分区列作为查询中的最后一列

设置hive.exec.dynamic.partition.mode=nonstrict；如果严格的话

在mapreduce严格模式（hive.mapred.mode=strict）下，不允许运行一些有风险的查询。这些措施包括：

笛卡尔积

没有为查询提取分区

比较bigint和字符串

比较大整数和双整数

无限制订购

根据第2点和第5点，我们不能在分区表上使用至少没有一个分区键筛选器（如WHERE country='US'）的SELECT语句或没有限制条件的ORDER BY子句。但默认情况下，此属性设置为nonstrict

您可以使用另一个中间表在HDFS上创建带有分区的数据

然后在最后一个表中更改分区的位置，以指向不同的位置，方法如下-

使用dbname；更改表格名称分区（partname=value）设置位置“位置”

或者，您可以直接更新配置单元元存储表SDS以获得适当的SD_ID

我很感激您的回答，但我正在寻找在使用动态分区方法执行“插入覆盖”时设置分区位置的方法。不幸的是，我无法将分区位置指定为HDFS集群。如果你读了我的代码，你就会明白，我想做什么。嗨，Saurav，希望你读对了我的问题。有多个子分区是动态的，我无法硬编码（在您的示例中，partname=value）。如果只是一个列分区，这很容易，但是由于动态的多个子分区，我无法做到这一点。不管怎样，我找到了另一个选择，不久将为此写一篇博客助教