Hive 连接配置单元中的分区表_Hive_Hiveql

Hive 连接配置单元中的分区表

hive

Hive 连接配置单元中的分区表,hive,hiveql,Hive,Hiveql,假设我有两个分区表，分别是customer和items，它们都是按country和state列分区的如果我想检索特定国家和州的数据，那么这是连接这些表内容的正确方法吗 select customer.id, customer.name, items.name, items.value from customers join items on customers.id == items.customer_id and customers.country =

假设我有两个分区表，分别是

customer

和

items

，它们都是按

country

和

state

列分区的

如果我想检索特定国家和州的数据，那么这是连接这些表内容的正确方法吗

select 
  customer.id, 
  customer.name, 
  items.name, 
  items.value
from
  customers
  join items
  on customers.id == items.customer_id
  and customers.country == 'USA'
  and customers.state == 'TX'
  and items.country == 'USA'
  and items.state == 'TX'

或者这些条件应该包含在WHERE条款中

and customers.country == 'USA'
and customers.state == 'TX'
and items.country == 'USA'
and items.state == 'TX'

对于简单的查询，配置单元将在reduce阶段之前推送谓词，因此在这种情况下，将条件放在“on”或“where”子句上的性能是相同的。但是，如果您编写其他查询来比较表之间的字段（table1.a 对于简单查询，配置单元将在reduce阶段之前推送谓词，因此在这种情况下，将条件放在“on”或“where”子句上的性能是相同的。但是，如果您编写其他查询来比较表之间的字段（table1.a 我们可以连接分区表，分区只是文件夹结构，分区是指根据特定列的值（如日期、状态等）将表划分为相关部分的方式。例如，我有如下分区

show partitions table_name1 
year=2016/month=12/day=1/part=10

show partitions table_name2 
year=2016/month=12/day=1/part=1

现在我们可以通过以下方式连接表

select i.col1, c.col1
FROM (SELECT * FROM table_name1 WHERE year=2016 AND month=12 AND day=1) i
JOIN (SELECT * FROM table_name2 WHERE year=2016 AND month=12 AND day=1) c
ON i.col2= c.col2
AND i.col3= c.col3
AND i.col3= c.col3
GROUP BY c.col1

或

我们可以连接分区表，分区只是文件夹结构，分区意味着根据特定列的值（例如日期、状态等）将表划分为相关部分的方式。例如，我有如下分区

show partitions table_name1 
year=2016/month=12/day=1/part=10

show partitions table_name2 
year=2016/month=12/day=1/part=1

现在我们可以通过以下方式连接表

select i.col1, c.col1
FROM (SELECT * FROM table_name1 WHERE year=2016 AND month=12 AND day=1) i
JOIN (SELECT * FROM table_name2 WHERE year=2016 AND month=12 AND day=1) c
ON i.col2= c.col2
AND i.col3= c.col3
AND i.col3= c.col3
GROUP BY c.col1

或

这些条件应该放在WHERE子句中。这些条件应该放在WHERE子句中。