Sql 配置单元-分层组上的多个(平均)计数差异
给定以下源数据(假设表名为Sql 配置单元-分层组上的多个(平均)计数差异,sql,hive,group-by,hiveql,Sql,Hive,Group By,Hiveql,给定以下源数据(假设表名为user\u activity): 我希望得到以下结果: +-----------+------------+---------------------+ | user_type | user_count | average_daily_users | +-----------+------------+---------------------+ | a | 3 | 2 | | b
user\u activity
):
我希望得到以下结果:
+-----------+------------+---------------------+
| user_type | user_count | average_daily_users |
+-----------+------------+---------------------+
| a | 3 | 2 |
| b | 2 | 1.5 |
+-----------+------------+---------------------+
在同一个表上使用单个查询而不使用多个子查询
使用多个查询,我可以获得:
:用户计数
select user_type, count(distinct user_id) from user_activity group by user_type
- 对于
:平均每日用户
select user_type, avg(distinct_users) as average_daily_users from ( select count(distinct user_id) as distinct_users from user_activity group by user_type, some_date ) group by user_type
注2:此问题与窗口函数(用于计算“平均每日用户”列)中的“按
分区”列共享详细信息。这应该满足您的要求:
select ua.user_type,
count(distinct ua.user_id) as user_count,
count(distinct some_date || ':' || ua.user_id) / count(distinct some_date)
from user_activity ua
group by ua.user_type;
好主意!我甚至没有考虑过自己的平均水平。这感觉是如此错误,但又如此正确。太好了!
select ua.user_type,
count(distinct ua.user_id) as user_count,
count(distinct some_date || ':' || ua.user_id) / count(distinct some_date)
from user_activity ua
group by ua.user_type;