同一数据集上不同级别的SQL分组
我有以下数据集,我希望创建不同的组来统计名称下的值的出现次数 Have:(县在字符串中) 想要: 我相信通过SQL,过分区将允许按不同级别进行计数 比如:同一数据集上不同级别的SQL分组,sql,sql-server,hadoop,hive,cloudera,Sql,Sql Server,Hadoop,Hive,Cloudera,我有以下数据集,我希望创建不同的组来统计名称下的值的出现次数 Have:(县在字符串中) 想要: 我相信通过SQL,过分区将允许按不同级别进行计数 比如: count(name) over (partition by name) as freq_name, count(name) over (partition by state) as freq_state, count(name) as freq_county from have group by name,state, county; 由
count(name) over (partition by name) as freq_name,
count(name) over (partition by state) as freq_state,
count(name) as freq_county
from have
group by name,state, county;
由于某些原因,这段代码没有为freq_name提供正确的计数。我还想检查我的freq_state和freq_county代码是否准确。谢谢 对于
freq\u name
,使用count(*)
而不是count(name)
对于
freq\u name
,请使用count(*)
而不是count(name)
你似乎想要:
select name, state, county, count(*) as this_count,
sum(count(*)) over (partition by name) as freq_name,
sum(count(*)) over (partition by state) as freq_state,
sum(count(*)) as freq_county
from have
group by name, state, county;
你似乎想要:
select name, state, county, count(*) as this_count,
sum(count(*)) over (partition by name) as freq_name,
sum(count(*)) over (partition by state) as freq_state,
sum(count(*)) as freq_county
from have
group by name, state, county;
还有,我的freq_state代码看起来正确吗?我使用的是大数据集,很难在详细的层面上进行验证。我非常确定freq_county应该可以工作……还有,我的freq_state代码看起来正确吗?我使用的是大型数据集,很难在详细级别上进行验证。我非常确定freq_county应该可以工作…是否可以计数(*)允许我仅按州计算名称的外观?@lydias。。这是使用窗口函数。“是的”。您是否运行了查询?是否计数(*)允许我仅按状态计数名称的外观?@lydias。。这是使用窗口函数。“是的”。你运行查询了吗?
count(*) over (partition by name) as freq_name,
count(name) over (partition by state) as freq_state,
count(name) as freq_county
from have
group by name,state, county;
select name, state, county, count(*) as this_count,
sum(count(*)) over (partition by name) as freq_name,
sum(count(*)) over (partition by state) as freq_state,
sum(count(*)) as freq_county
from have
group by name, state, county;