Sql 根据group by计算空值的百分比

Sql 根据group by计算空值的百分比,sql,group-by,hive,count,window-functions,Sql,Group By,Hive,Count,Window Functions,我正试图修改基于group by计算空值的方法,唯一的区别是我想要它们的百分比(例如2016年第1列的30%为空),而不是计数(例如2016年第1列的6521为空)。我的问题是: WITH t1nulltest AS ( select date_column ,SUM(IF(c1 IS NULL,1,0))/count(*) OVER (PARTITION BY date_column) as c1null ,SUM(IF(c2 IS NULL,1,0))/count(*) OVER (PART

我正试图修改基于group by计算空值的方法,唯一的区别是我想要它们的百分比(例如2016年第1列的30%为空),而不是计数(例如2016年第1列的6521为空)。我的问题是:

WITH t1nulltest AS
( select date_column
,SUM(IF(c1 IS NULL,1,0))/count(*) OVER (PARTITION BY date_column) as c1null
,SUM(IF(c2 IS NULL,1,0))/count(*) OVER (PARTITION BY date_column) as c2null
,SUM(IF(c3 IS NULL,1,0))/count(*) OVER (PARTITION BY date_column) as c3null
,SUM(IF(c4 IS NULL,1,0))/count(*) OVER (PARTITION BY date_column) as c4null
,SUM(IF(c5 IS NULL,1,0))/count(*) OVER (PARTITION BY date_column) as c5null
,row_number() OVER (PARTITION BY date_column) as rowno
from t1) 
select 
  date_column, c1null, c2null,c3null,c4null,c5null from t1nulltest
  where rowno =1;
与原始解决方案的唯一区别是我添加了
/count(*)
,但这不起作用,我想知道为什么。原始查询可以工作。我的查询给出了错误:

Error while compiling statement: FAILED: SemanticException Failed to breakup Windowing invocations into Groups. At least 1 group must only depend on input columns. Also check for circular dependencies. Underlying error: org.apache.hadoop.hive.ql.parse.SemanticException: line 2:68 Expression not in GROUP BY key 'date_column'

我怀疑整个逻辑可以简化为一个简单的聚合查询

select
    date_column,
    1.0 * (count(*) - count(c1)) / count(*) c1_null_ratio,
    1.0 * (count(*) - count(c2)) / count(*) c2_null_ratio,
    1.0 * (count(*) - count(c3)) / count(*) c3_null_ratio,
    1.0 * (count(*) - count(c4)) / count(*) c4_null_ratio,
    1.0 * (count(*) - count(c5)) / count(*) c5_null_ratio
from t1
group by date_column
这也可以用
avg()
来表达:


我怀疑整个逻辑可以简化为一个简单的聚合查询

select
    date_column,
    1.0 * (count(*) - count(c1)) / count(*) c1_null_ratio,
    1.0 * (count(*) - count(c2)) / count(*) c2_null_ratio,
    1.0 * (count(*) - count(c3)) / count(*) c3_null_ratio,
    1.0 * (count(*) - count(c4)) / count(*) c4_null_ratio,
    1.0 * (count(*) - count(c5)) / count(*) c5_null_ratio
from t1
group by date_column
这也可以用
avg()
来表达: