Google bigquery BigQuery中的percent_rank(),条件为仅包含某些行
我以前发过。此问题的解决方案适用于分析函数Google bigquery BigQuery中的percent_rank(),条件为仅包含某些行,google-bigquery,Google Bigquery,我以前发过。此问题的解决方案适用于分析函数rank(),但不适用于percent\u rank()。为了演示,我有以下虚拟表格: with table as ( select 'a' as category, 1 as num, 15 as num2, 7 as cutoff union all select 'a' as category, 2 as num, 15 as num2, 7 as cutoff union all select 'a' as cat
rank()
,但不适用于percent\u rank()
。为了演示,我有以下虚拟表格:
with
table as (
select 'a' as category, 1 as num, 15 as num2, 7 as cutoff union all
select 'a' as category, 2 as num, 15 as num2, 7 as cutoff union all
select 'a' as category, 3 as num, 5 as num2, 7 as cutoff union all
select 'a' as category, 4 as num, 5 as num2, 7 as cutoff union all
select 'a' as category, 5 as num, 5 as num2, 7 as cutoff union all
select 'a' as category, 6 as num, 5 as num2, 7 as cutoff union all
select 'a' as category, 7 as num, 5 as num2, 7 as cutoff union all
select 'a' as category, 8 as num, 5 as num2, 7 as cutoff union all
select 'a' as category, 9 as num, 5 as num2, 7 as cutoff union all
select 'a' as category, 10 as num, 15 as num2, 7 as cutoff union all
select 'a' as category, 11 as num, 15 as num2, 7 as cutoff union all
select 'a' as category, 12 as num, 15 as num2, 7 as cutoff union all
select 'a' as category, 13 as num, 15 as num2, 7 as cutoff union all
select 'a' as category, 14 as num, 15 as num2, 7 as cutoff union all
select 'a' as category, 15 as num, 15 as num2, 7 as cutoff union all
select 'a' as category, 16 as num, 15 as num2, 7 as cutoff union all
select 'a' as category, 17 as num, 5 as num2, 7 as cutoff union all
select 'a' as category, 18 as num, 15 as num2, 7 as cutoff union all
select 'a' as category, 19 as num, 15 as num2, 7 as cutoff union all
select 'a' as category, 20 as num, 5 as num2, 7 as cutoff union all
select 'a' as category, 21 as num, 5 as num2, 7 as cutoff
)
num
列需要percent\u rank()
。但是,百分位排名只应考虑num2>截止值
的行。我尝试了以下两种方法来计算百分位数,并给出了结果:
select
*,
if(num2 >= cutoff,
percent_rank() over(
partition by category
order by num
), null) as pctile1,
if(num2 >= cutoff,
percent_rank() over(
partition by category
order by if (num2 >= cutoff, num, null) ASC
), null) as pctile2
from table
order by num asc
pctile1
和pctile2
都不正确。要说明为什么会出现这种情况,请查看第10行,该行具有pctile1==0.45
和pctile2==0.60
。然而,在合格值中,这应该是一个较低的百分位数。只有2个符合条件的值低于num==10
(即1和2),而10以上的许多值符合条件(11-19)。给定num==10
和cutoff
值,正确的num==10的百分位数应该接近30%,因为10
是11个限定值中的第三个最低值
请注意,我不应
筛选表以删除我未percent_rank()“覆盖”的行,因为我需要保留这些行
编辑
我不知道如何缩小图像大小,但我目前正在尝试这样做。我只想使用下面的选项
#standardSQL
SELECT *,
PERCENT_RANK() OVER(PARTITION BY category ORDER BY num) AS pctile
FROM table WHERE num2 >= cutoff
UNION ALL
SELECT *, NULL
FROM table WHERE num2 < cutoff
-- ORDER BY num
在我看来,上面的内容很容易阅读,但下面的内容很可能是你想要的
SELECT *,
IF(num2 >= cutoff,
PERCENT_RANK() OVER(PARTITION BY IF(num2 >= cutoff, category, NULL) ORDER BY num),
NULL) AS pctile
FROM table
-- ORDER BY num
显然,与上面的结果相同ahh,我在
orderby
中使用了if
条件,而不是分区
。谢谢分享/更正。
SELECT *,
IF(num2 >= cutoff,
PERCENT_RANK() OVER(PARTITION BY IF(num2 >= cutoff, category, NULL) ORDER BY num),
NULL) AS pctile
FROM table
-- ORDER BY num