Sql 如何知道分区在配置单元中的位置？_Sql_Hadoop_Hive

Sql 如何知道分区在配置单元中的位置？

sql hadoop hive

Sql 如何知道分区在配置单元中的位置？,sql,hadoop,hive,Sql,Hadoop,Hive,如果我写一个像这样的配置单元sql ALTER TABLE tbl_name ADD PARTITION (dt=20131023) LOCATION 'hdfs://path/to/tbl_name/dt=20131023; SELECT data FROM tbl_name where dt=20131023; 以后如何查询有关分区的此位置？因为我发现位置中有一些数据，但我无法查询它们，就像配置单元sql一样 ALTER TABLE tbl_name ADD PARTITION (dt=

如果我写一个像这样的配置单元sql

ALTER TABLE tbl_name ADD PARTITION (dt=20131023) LOCATION 'hdfs://path/to/tbl_name/dt=20131023;

SELECT data FROM tbl_name where dt=20131023;

以后如何查询有关分区的此位置？因为我发现位置中有一些数据，但我无法查询它们，就像配置单元sql一样

ALTER TABLE tbl_name ADD PARTITION (dt=20131023) LOCATION 'hdfs://path/to/tbl_name/dt=20131023;

SELECT data FROM tbl_name where dt=20131023;

SHOW TABLE EXTENDED

将列出与给定正则表达式匹配的所有表的信息。如果存在分区规范，则用户不能对表名使用正则表达式。此命令的输出包括基本表信息和文件系统信息，如

totalNumberFiles

，

totalFileSize

，

maxFileSize

，

minFileSize

，

lastAccessTime

，以及

lastUpdateTime

。如果存在分区，它将输出给定分区的文件系统信息，而不是表的文件系统信息

对分区而不是整个表进行描述。
这将显示链接位置（如果是外部表）

describe formatted tbl_name partition (dt='20131023')

如果有多个嵌套分区，则语法为：

describe formatted table_name partition (day=123,hour=2);

这是我用来获取特定表中特定分区的确切HDFS位置的命令格式：

show table extended like flight_context_fused_record partition(date_key='20181013', partition_id='P-DUK2nESsv', custom_partition_1='ZMP');

在上面的命令中，分区规范由三个单独的字段组成。你的例子可能有或多或少的影响

见下面的结果。请注意，“location:”字段显示HDFS文件夹的位置

hive (nva_test)> show table extended like flight_context_fused_record partition(date_key='20181013', partition_id='P-DUK2nESsv', custom_partition_1='ZMP');
OK
tableName:flight_context_fused_record
owner:nva-prod
location:hdfs://hdp1-ha/tmp/vfisher/cms-context-acquisition-2019-06-13/FlightContextFusedRecord/2018/10/13/ZMP/P-DUK2nESsv
inputformat:org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
outputformat:org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
columns:struct columns { string primary_key, string facility, string position, i32 dalr_channel, i64 start_time_unix_millis, i64 end_time_unix_millis, string foreign_key_to_audio_segment, struct<on_frequency_flight_list:list<struct<acid:string,ac_type:string>>,transfer_list:list<struct<primary_key:string,acid:string,data_id:string,ac_type:string,from_facility:string,from_position:string,transition_time:i64,transition_time_start:i64,transtition_time_end:i64,to_facility:string,to_position:string,source:string,source_info:string,source_time:i64,confidence:double,confidence_description:string,uuid:string>>,source_list:list<string>,domain:string,domains:list<string>> flight_context}
partitioned:true
partitionColumns:struct partition_columns { i32 date_key, string partition_id, string custom_partition_1}
totalNumberFiles:1
totalFileSize:247075687
maxFileSize:247075687
minFileSize:247075687
lastAccessTime:1561122938361
lastUpdateTime:1561071155639

hive（nva_测试）>显示扩展的表，如flight_context_fused_记录分区（date_key='20181013'，partition_id='P-DUK2nESsv'，custom_partition_1='ZMP'）；
好啊
表名：航班上下文融合记录
所有者：nva prod
地点：hdfs://hdp1-ha/tmp/vfisher/cms-context-acquisition-2019-06-13/FlightContextFusedRecord/2018/10/13/ZMP/P-DUK2nESsv
inputformat:org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
outputformat:org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
列：结构列{string primary_key，string facility，string position，i32 dalr_channel，i64 start_time_unix_millis，i64 end_time_unix_millis，string foreign_key_to_audio_segment，struct flight_context}
分区：正确
partitionColumns:struct partition_columns{i32 date_key，string partition_id，string custom_partition_1}
总数文件：1
总文件大小：247075687
最大文件大小：247075687
minFileSize:247075687
最近访问时间：156112938361
最新更新时间：156107155639

命令的一般形式（去掉我的特定值并放入参数说明符）如下所示：

show table extended like <your table name here> partition(<your partition spec here>);

show表像partition（）一样扩展；

如果您想知道正在读取的文件的位置，请使用

SELECT INPUT__FILE__NAME, BLOCK__OFFSET__INSIDE__FILE FROM <table> WHERE <part_name> = '<part_key>'

从WHERE=''中选择输入文件名、块偏移量

然后你得到

hdfs:///user/hive/warehouse/<db>/<table>/<part_name>=<part_key>/000000_0.snappy, 0
hdfs:///user/hive/warehouse/<db>/<table>/<part_name>=<part_key>/000000_1.snappy, 0

hdfs:///user/hive/warehouse///=/000000_0.snappy, 0
hdfs:///user/hive/warehouse///=/000000_1.snappy, 0

您只需执行以下操作：

DESC FORMATTED tablename PARTITION (yr_no='y2019');

OR

DESC EXTENDED tablename PARTITION (yr_no='y2019');

通过运行以下任何配置单元命令，可以获取HDFS上配置单元分区的位置

DESCRIBE FORMATTED tbl_name  PARTITION(dt=20131023);
SHOW TABLE EXTENDED LIKE tbl_name PARTITION(dt=20131023);

或者，您也可以通过运行HDFS list命令来获取

hdfs dfs -ls <your Hive store location>/<tablename>

hdfs-dfs-ls/

链接：

谢谢，

NNK

您可以通过Hive Metastore Thrift协议获取此信息，例如：

配置单元cli：

hive> create table test_table_with_partitions(f1 string, f2 int) partitioned by (dt string);
OK
Time taken: 0.127 seconds

hive> alter table test_table_with_partitions add partition(dt=20210504) partition(dt=20210505);
OK
Time taken: 0.152 seconds

Python cli：

客户端为c时的

>>：
...     partition=c.get_partition_by_name（db_name='default'，
tbl_name='test_table_，带有分区'，
零件名称（dt=20210504）
... 
>>>分区.sd.location
'hdfs://hdfs.master.host:8020/user/hive/warehouse/test_table_with_partitions/dt=20210504'

如何列出所有分区的位置而不仅仅是一个？@morpheus我使用for循环和

显示分区表

显示所有分区的位置。我找不到一行命令来实现这一点。对于多层分区，需要提供完整的规范，如下所示。