Hive 配置单元多个子查询和分组依据

Hive 配置单元多个子查询和分组依据,hive,amazon-dynamodb,emr,hiveql,Hive,Amazon Dynamodb,Emr,Hiveql,我正在将统计数据从MySQL切换到Amazon DynamoDB和Elastic MapReduce 我有一个使用MySQL的查询工具,我在hive上有相同的表,需要与MySQL上相同的结果(上周、上月和去年的产品视图) 我想出了如何获得结果,例如上个月的hive: SELECT product_id, COUNT(product_id) as views from dev_product_views_hive WHERE created >= UNIX_TIMESTAMP(CONCAT(

我正在将统计数据从MySQL切换到Amazon DynamoDB和Elastic MapReduce

我有一个使用MySQL的查询工具,我在hive上有相同的表,需要与MySQL上相同的结果(上周、上月和去年的产品视图)

我想出了如何获得结果,例如上个月的hive:

SELECT product_id, COUNT(product_id) as views from dev_product_views_hive WHERE created >= UNIX_TIMESTAMP(CONCAT(DATE_SUB(FROM_UNIXTIME(UNIX_TIMESTAMP()), 31)," ","00:00:00")) GROUP BY product_id;
但我需要像MySql一样的分组结果:

product_id views_last_week views_last_month views_last_year
2                 564             2460         29967
4                 980             3986         54982  
有可能用蜂箱来做这个吗

提前谢谢大家,


Amer

您可以在
sum()或
count()时使用
大小写来完成此操作

例如

concat(date_sub(to_date)(from_unixtime(unix_timestamp()),days),“00:00:00”)
将返回超过当前时间的天数的格式化字符串

case when
将在创建时返回1
=
您期望的天数

您还可以使用配置单元内置函数
count()
来完成此操作,该函数仅统计返回非空的行

count(case when created >= concat(date_sub(to_date(from_unixtime(unix_timestamp())), 7)," 00:00:00") then 1 end)  as weekly

是的,总是得到解析错误,我不知道如何在hiveql中正确编写这个MySQL子查询。Hiveql与MySQL的语法不同。Hiveql只支持来自Clause的子查询。我知道,还有其他方法实现这一点吗?
select product_id, 
sum(case when created >= concat(date_sub(to_date(from_unixtime(unix_timestamp())), 7)," 00:00:00") then 1 else 0 end)  as weekly,
sum(case when created >= concat(date_sub(to_date(from_unixtime(unix_timestamp())), 31)," 00:00:00") then 1 else 0 end) as monthly,
sum(case when created >= concat(date_sub(to_date(from_unixtime(unix_timestamp())), 365)," 00:00:00") then 1 else 0 end) as yearly
from dev_product_views_hive 
group by product_id;
count(case when created >= concat(date_sub(to_date(from_unixtime(unix_timestamp())), 7)," 00:00:00") then 1 end)  as weekly