Hive 如何在配置单元的分区数据中创建表？_Hive_Create Table_Hive Partitions

Hive 如何在配置单元的分区数据中创建表？

hive

Hive 如何在配置单元的分区数据中创建表？,hive,create-table,hive-partitions,Hive,Create Table,Hive Partitions,您可以尝试下面给出的步骤方法1 标识架构（列名和类型，包括分区列）创建配置单元分区表（确保添加分区列和分隔符信息）将数据加载到分区表中。（在这种情况下，加载文件将没有分区列，因为您将通过Load命令对其进行硬编码）创建表（col1数据类型1，col2数据类型2..）分区依据（part_col data_type3）行格式分隔以“”结尾的字段将路径“/hdfs/loc/file1”中的数据加载到表中分区（‘201601’）；将路径“/hdfs/loc/file1”中的数据加载到

您可以尝试下面给出的步骤

方法1

标识架构（列名和类型，包括分区列）

创建配置单元分区表（确保添加分区列和分隔符信息）

将数据加载到分区表中。（在这种情况下，加载文件将没有分区列，因为您将通过

Load

命令对其进行硬编码）

创建表（col1数据类型1，col2数据类型2..）
分区依据（part_col data_type3）
行格式分隔
以“”结尾的字段
将路径“/hdfs/loc/file1”中的数据加载到表中
分区（‘201601’）；
将路径“/hdfs/loc/file1”中的数据加载到表中
分区（'201602'）
将路径“/hdfs/loc/file1”中的数据加载到表中
分区（'201603'）

等等

方法2

创建一个与主表具有相同模式但没有任何分区的临时表

将整个数据加载到此表中（确保将“分区列””作为这些文件中的字段之一）

使用动态分区插入将数据从临时表加载到主表

create table <table_name> (col1 data_type1, col2 data_type2..)
partitioned by(part_col data_type3)
row format delimited
fields terminated by '<field_delimiter_in_your_data>'

load data inpath '/hdfs/loc/file1' into table <table_name>
partition (<part_col>='201601');

load data inpath '/hdfs/loc/file1' into table <table_name>
partition (<part_col>='201602')

load data inpath '/hdfs/loc/file1' into table <table_name>
partition (<part_col>='201603')

创建表（col1数据类型1，col2数据类型2..）
行格式分隔
以“”结尾的字段
创建表（col1数据类型1，col2数据类型2..）
分区依据（第3部分列数据类型）；
将路径“/hdfs/loc/directory/”中的数据加载到表中；
设置hive.exec.dynamic.partition=true；
设置hive.exec.dynamic.partition.mode=nonstrict；
插入表格
分区（第二部分）
从中选择col1、col2、….part_col；

方法2的关键方面包括：

使“零件列”在加载文件中作为字段可用

在最后一条insert语句中，从select子句中获取“part\u col”作为最后一个字段

您可以尝试以下步骤
方法1

标识架构（列名和类型，包括分区列）

创建配置单元分区表（确保添加分区列和分隔符信息）

将数据加载到分区表中。（在这种情况下，加载文件将没有分区列，因为您将通过
Load
命令对其进行硬编码）

创建表（col1数据类型1，col2数据类型2..）分区依据（part_col data_type3）行格式分隔以“”结尾的字段将路径“/hdfs/loc/file1”中的数据加载到表中分区（‘201601’）；将路径“/hdfs/loc/file1”中的数据加载到表中分区（'201602'）将路径“/hdfs/loc/file1”中的数据加载到表中分区（'201603'）
等等

方法2

创建一个与主表具有相同模式但没有任何分区的临时表

将整个数据加载到此表中（确保将“分区列””作为这些文件中的字段之一）

使用动态分区插入将数据从临时表加载到主表

create table <table_name> (col1 data_type1, col2 data_type2..) partitioned by(part_col data_type3) row format delimited fields terminated by '<field_delimiter_in_your_data>' load data inpath '/hdfs/loc/file1' into table <table_name> partition (<part_col>='201601'); load data inpath '/hdfs/loc/file1' into table <table_name> partition (<part_col>='201602') load data inpath '/hdfs/loc/file1' into table <table_name> partition (<part_col>='201603')

创建表（col1数据类型1，col2数据类型2..）行格式分隔以“”结尾的字段创建表（col1数据类型1，col2数据类型2..）分区依据（第3部分列数据类型）；将路径“/hdfs/loc/directory/”中的数据加载到表中；设置hive.exec.dynamic.partition=true；设置hive.exec.dynamic.partition.mode=nonstrict；插入表格分区（第二部分）从中选择col1、col2、….part_col；
方法2的关键方面包括：

使“零件列”在加载文件中作为字段可用

在最后一条insert语句中，从select子句中获取“part\u col”作为最后一个字段

让我们创建一个表，其中包含一个年和一个月的分区，表中有一个时间戳：

create table <staging_table> (col1 data_type1, col2 data_type2..) row format delimited fields terminated by '<field_delimiter_in_your_data>' create table <main_table> (col1 data_type1, col2 data_type2..) partitioned by(part_col data_type3); load data inpath '/hdfs/loc/directory/' into table <staging_table>; SET hive.exec.dynamic.partition=true; SET hive.exec.dynamic.partition.mode=nonstrict; insert into table <main_table> partition(part_col) select col1,col2,....part_col from <staging_table>;
现在我得换桌子了

CREATE TABLE `mypart_p`( `id` bigint, `open_ts` string ) PARTITIONED BY (YEAR INT, MONTH INT)
我必须每年每月都这样做，在python中循环完成。现在让我们用数据填充它，并指定该数据属于哪个分区：

ALTER TABLE mypart_p ADD PARTITION (YEAR=2020, MONTH=1)

让我们创建一个以年和月为分区的表，表中有一个时间戳：

create table <staging_table> (col1 data_type1, col2 data_type2..) row format delimited fields terminated by '<field_delimiter_in_your_data>' create table <main_table> (col1 data_type1, col2 data_type2..) partitioned by(part_col data_type3); load data inpath '/hdfs/loc/directory/' into table <staging_table>; SET hive.exec.dynamic.partition=true; SET hive.exec.dynamic.partition.mode=nonstrict; insert into table <main_table> partition(part_col) select col1,col2,....part_col from <staging_table>;
现在我得换桌子了

CREATE TABLE `mypart_p`( `id` bigint, `open_ts` string ) PARTITIONED BY (YEAR INT, MONTH INT)
我必须每年每月都这样做，在python中循环完成。现在让我们用数据填充它，并指定该数据属于哪个分区：

ALTER TABLE mypart_p ADD PARTITION (YEAR=2020, MONTH=1)

我已经给出了完整的图像目录，请检查和加载每个分区将是繁忙的。任何可用的简单解决方案。好的，创建一个临时表并将数据加载到其中（只需指定外部目录）。然后，使用dynami分区插入到主表中。您可以像在第一条注释中那样详细说明这些步骤吗。我是新来的。这对我会有很大帮助。谢谢你按我的要求给了我答复。请为您的问题添加说明，以便其他用户也能理解您的问题。它不会在编辑后显示。如果可行，就接受这个解决方案！无法使用加载数据，因为它拒绝访问。我已经给出了目录的完整映像，请检查并加载每个分区将非常繁忙。任何可用的简单解决方案。好的，创建一个临时表并将数据加载到其中（只需指定外部目录）。然后，使用dynami分区插入到主表中。您可以像在第一条注释中那样详细说明这些步骤吗。我是新来的。这对我会有很大帮助。谢谢你按我的要求给了我答复。请为您的问题添加说明，以便其他用户也能理解您的问题。它不会在编辑后显示。如果可行，就接受这个解决方案！无法使用加载数据，因为它拒绝访问。