Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/ssis/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Hive 配置单元中的联合分组结果集_Hive_Bigdata_Hiveql - Fatal编程技术网

Hive 配置单元中的联合分组结果集

Hive 配置单元中的联合分组结果集,hive,bigdata,hiveql,Hive,Bigdata,Hiveql,我需要在2018日历年的季度内打破一个按ID列分组的配置单元查询。下面是我目前正在做的事情,我希望另一个选项可以用更少的查询获得相同的结果 --查询1 2018年第一季度以及第二季度、第三季度和第四季度的三个相同查询 Create TABLE Q12018 stored as ORC as select ID, count(1) as cnt, sum(revenue) as revenue, sum( (CASE WHEN condition1 THEN 1 ELSE

我需要在2018日历年的季度内打破一个按ID列分组的配置单元查询。下面是我目前正在做的事情,我希望另一个选项可以用更少的查询获得相同的结果

--查询1 2018年第一季度以及第二季度、第三季度和第四季度的三个相同查询

Create TABLE Q12018 stored as ORC as
select
ID,
count(1) as cnt, 
sum(revenue) as revenue,
sum( (CASE
    WHEN condition1
    THEN 1
    ELSE 0 END)) as metric1,
sum( (CASE
    WHEN condition2
    THEN revenue
    ELSE 0 END)) as metric2,           

sum( (CASE
    WHEN condition3
    THEN 1
    ELSE 0 END)) as metric3,
sum( (CASE
    WHEN codition4
    THEN revenue
    ELSE 0 END)) as metric4                            
from mainTable
where month between 201801 and 201803
group by 
ID;
--问题2

--问题3


似乎在最后,您正在汇总所有按ID分组的季度结果。如果最终结果是季度结果的汇总,则更改where子句以包括整个年度范围,以实现相同的最终结果

select
   ID,
   count(1) as cnt, 
   sum(revenue) as revenue,
   sum((CASE  WHEN condition1  THEN 1  ELSE 0 END)) as metric1,
   sum((CASE  WHEN condition2  THEN revenue  ELSE 0 END)) as metric2,           
   sum((CASE  WHEN condition3  THEN 1  ELSE 0 END)) as metric3,
   sum((CASE  WHEN condition4  THEN revenue  ELSE 0 END)) as metric4                       
from mainTable
where month between 201801 and 201812
group by ID;

我需要将查询分成季度块,因为主表的大小,我们的集群在涉及长日期范围的查询时是不稳定的。我的原创作品与您建议的一模一样,但存在许多性能问题。@hghg然后您注意的是调整适当的并行性:以及:
Create TABLE Agg2018 stored as ORC as

Select 
ID,
Sum(cnt),
Sum(revenue),
Sum(metric1),
Sum(metric2),
sum(metric3),
sum(metric4)
from combined2018  
group by ID
select
   ID,
   count(1) as cnt, 
   sum(revenue) as revenue,
   sum((CASE  WHEN condition1  THEN 1  ELSE 0 END)) as metric1,
   sum((CASE  WHEN condition2  THEN revenue  ELSE 0 END)) as metric2,           
   sum((CASE  WHEN condition3  THEN 1  ELSE 0 END)) as metric3,
   sum((CASE  WHEN condition4  THEN revenue  ELSE 0 END)) as metric4                       
from mainTable
where month between 201801 and 201812
group by ID;