Hive 配置单元-列级子查询解决方法

Hive 配置单元-列级子查询解决方法,hive,hiveql,Hive,Hiveql,我在列级子查询中遇到了问题,假设我想要这样的结果: select main_table.date,main_table.store,main_table.transaction,yest_table.transaction as yesterday_trans, lw_table.transaction as lastweek_trans, lm_table.transaction as lastmonth_trans from (select date, store, tran

我在列级子查询中遇到了问题,假设我想要这样的结果:

select main_table.date,main_table.store,main_table.transaction,yest_table.transaction as yesterday_trans, lw_table.transaction as lastweek_trans, lm_table.transaction as lastmonth_trans
    from
    (select date, store, transaction from table where date=current_date)main_table
    left join
    (select date, store, transaction from table where date=date_sub(current_date,1))yest_table
    on date_sub(main_table.date,1)=yest_table.date and main_table.store=yest_table.store
    left join
    (select date, store, transaction from table where date=date_sub(current_date,7))lw_table
    on date_sub(main_table.date,7)=lw_table.date and main_table.store=yest_table.store
    left join
    (select date, store, transaction from table where date=date_sub(current_date,7))lm_table
    on add_months(current_date,-1)=lm_table.date and main_table.store=yest_table.store
自联接表中包含日期、存储和事务

我知道使用列级子查询使用传统数据仓库是可以实现的,但我发现hive缺少此功能,因此我创建了自己的查询,如下所示:

select main_table.date,main_table.store,main_table.transaction,yest_table.transaction as yesterday_trans, lw_table.transaction as lastweek_trans, lm_table.transaction as lastmonth_trans
    from
    (select date, store, transaction from table where date=current_date)main_table
    left join
    (select date, store, transaction from table where date=date_sub(current_date,1))yest_table
    on date_sub(main_table.date,1)=yest_table.date and main_table.store=yest_table.store
    left join
    (select date, store, transaction from table where date=date_sub(current_date,7))lw_table
    on date_sub(main_table.date,7)=lw_table.date and main_table.store=yest_table.store
    left join
    (select date, store, transaction from table where date=date_sub(current_date,7))lm_table
    on add_months(current_date,-1)=lm_table.date and main_table.store=yest_table.store
对吗?因为我认为可能有更好的解决办法

感谢您

用例+最大聚合:

select main.date,main.store,main.transaction,s.yesterday_trans,s.lastweek_trans,s.lastmonth_trans
    from
    (select date, store, transaction from table where date=current_date)main
    left join
    (select store, 
       max(case when date = date_sub(current_date,1)    then transaction end) yesterday_trans,  
       max(case when date = date_sub(current_date,7)    then transaction end) lastweek_trans,
       max(case when date = add_months(current_date,-1) then transaction end) lastmonth_trans
       from table 
      where date>=add_months(current_date,-1) and date<=date_sub(current_date,1)
      group by store
    ) s on main.store=s.store;
这样,您将消除两个不必要的表扫描和联接。 此解决方案仅适用于当前日期或固定参数,而不是当前日期。如果您想从主表中选择多个日期,那么具有三个按日期连接+存储的解决方案将最有效

嗯,也许,滞后也是一个适用的解决方案

select date,store,transaction,
    case when lag(date,1) over(partition by store order by date) = date_sub(date,1)) --check if LAG(1) is yesterday (previous date)
         then lag(transaction ,1) over(partition by store order by date) = date_sub(current_date,1)) 
    end as yesterday_trans 
...
--where date>=add_months(current_date,-1) and date<=date_sub(current_date,1)

如有必要,添加聚合。如果具有滞后的解决方案是适用的,那么它将是最快的,因为它根本不需要连接,并且在一次扫描中完成所有操作。如果您每个日期有许多记录,那么您可能可以在延迟之前预先聚合它们。这不仅适用于当前_日期

预期输出是什么?@TKHN抱歉,我已在条件重复的联接中更新了子查询中的问题BTW,其中筛选器和lm_表子查询中的筛选器错误,应该是date=add_months当前_日期,-1@galih如果不使用如果我们没有当前的_日期,则意味着历史数据..将必须传递参数..?@TKHN则只有原始解决方案适用于具有高基数的COL。是..同意这是一个优化选项..2表扫描将是沉重的操作