Hive 配置单元中的联合分组结果集
我需要在2018日历年的季度内打破一个按ID列分组的配置单元查询。下面是我目前正在做的事情,我希望另一个选项可以用更少的查询获得相同的结果 --查询1 2018年第一季度以及第二季度、第三季度和第四季度的三个相同查询Hive 配置单元中的联合分组结果集,hive,bigdata,hiveql,Hive,Bigdata,Hiveql,我需要在2018日历年的季度内打破一个按ID列分组的配置单元查询。下面是我目前正在做的事情,我希望另一个选项可以用更少的查询获得相同的结果 --查询1 2018年第一季度以及第二季度、第三季度和第四季度的三个相同查询 Create TABLE Q12018 stored as ORC as select ID, count(1) as cnt, sum(revenue) as revenue, sum( (CASE WHEN condition1 THEN 1 ELSE
Create TABLE Q12018 stored as ORC as
select
ID,
count(1) as cnt,
sum(revenue) as revenue,
sum( (CASE
WHEN condition1
THEN 1
ELSE 0 END)) as metric1,
sum( (CASE
WHEN condition2
THEN revenue
ELSE 0 END)) as metric2,
sum( (CASE
WHEN condition3
THEN 1
ELSE 0 END)) as metric3,
sum( (CASE
WHEN codition4
THEN revenue
ELSE 0 END)) as metric4
from mainTable
where month between 201801 and 201803
group by
ID;
--问题2
--问题3
似乎在最后,您正在汇总所有按ID分组的季度结果。如果最终结果是季度结果的汇总,则更改where子句以包括整个年度范围,以实现相同的最终结果
select
ID,
count(1) as cnt,
sum(revenue) as revenue,
sum((CASE WHEN condition1 THEN 1 ELSE 0 END)) as metric1,
sum((CASE WHEN condition2 THEN revenue ELSE 0 END)) as metric2,
sum((CASE WHEN condition3 THEN 1 ELSE 0 END)) as metric3,
sum((CASE WHEN condition4 THEN revenue ELSE 0 END)) as metric4
from mainTable
where month between 201801 and 201812
group by ID;
我需要将查询分成季度块,因为主表的大小,我们的集群在涉及长日期范围的查询时是不稳定的。我的原创作品与您建议的一模一样,但存在许多性能问题。@hghg然后您注意的是调整适当的并行性:以及:
Create TABLE Agg2018 stored as ORC as
Select
ID,
Sum(cnt),
Sum(revenue),
Sum(metric1),
Sum(metric2),
sum(metric3),
sum(metric4)
from combined2018
group by ID
select
ID,
count(1) as cnt,
sum(revenue) as revenue,
sum((CASE WHEN condition1 THEN 1 ELSE 0 END)) as metric1,
sum((CASE WHEN condition2 THEN revenue ELSE 0 END)) as metric2,
sum((CASE WHEN condition3 THEN 1 ELSE 0 END)) as metric3,
sum((CASE WHEN condition4 THEN revenue ELSE 0 END)) as metric4
from mainTable
where month between 201801 and 201812
group by ID;