Sql 如何返回黄色普查块中的所有行?
嘿,模式是这样的:对于整个数据集,我们应该先按机器id排序,然后按ss2k排序。之后,对于每台机器,我们应该找到至少连续5个flag='census'的所有行。在此数据集中,结果应为所有黄色行 我无法使用以下命令返回黄色块的最后4行:Sql 如何返回黄色普查块中的所有行?,sql,postgresql,Sql,Postgresql,嘿,模式是这样的:对于整个数据集,我们应该先按机器id排序,然后按ss2k排序。之后,对于每台机器,我们应该找到至少连续5个flag='census'的所有行。在此数据集中,结果应为所有黄色行 我无法使用以下命令返回黄色块的最后4行: drop table if exists qz_panel_census_228_rank; create table qz_panel_census_228_rank as select t.* from (select t.*, co
drop table if exists qz_panel_census_228_rank;
create table qz_panel_census_228_rank as
select t.*
from (select t.*,
count(*) filter (where flag = 'census') over (partition by machine_id, date order by ss2k rows between current row and 4 following) as census_cnt5,
count(*) filter (where flag = 'census') over (partition by machine_id, date) as count_census,
row_number() over (partition by machine_id, date order by ss2k) as seqnum,
count(*) over (partition by machine_id, date) as cnt
from qz_panel_census_228 t
) t
where census_cnt5 = 5
group by 1,2,3,4,5,6,7,8,9,10,11
DISTRIBUTED BY (machine_id);
您很接近,但需要从两个方向进行搜索:
select t.*
from (select t.*,
case when count(*) filter (where flag = 'census')
over (partition by machine_id, date
order by ss2k
rows between 4 preceding and current row) = 5
or count(*) filter (where flag = 'census')
over (partition by machine_id, date
order by ss2k
rows between current row and 4 following) = 5
then 1
else 0
end as flag
from qz_panel_census_228 t
) t
where flag = 1
编辑:
除非您为每个可能的5行窗口添加额外的计数,例如3个前置和1个后续、2个前置和2个后续等,否则此方法将不起作用。这将导致代码丑陋,并且不是很灵活
解决此间隙和孤岛问题的常用方法是首先将连续行分配给公共组:
select *
from
(
select t2.*,
count(*) over (partition by machine_id, date, grp) as cnt
from
(
select t1.*
from (select t.*,
-- keep the same number for 'census' rows
sum(case when flag = 'census' then 0 else 1 end)
over (partition by machine_id, date
order by ss2k
rows unbounded preceding) as grp
from qz_panel_census_228 t
) t1
where flag = 'census' -- only census rows
) as t2
) t3
where cnt >= 5 -- only groups of at least 5 census rows
哇,一定有更好的方法,但我能找到的唯一方法是创建连续的“普查”值块。这看起来很糟糕,但可能是一个更好想法的催化剂
with q1 as (
select
machine_id, recorded, ss2k, flag, date,
case
when flag = 'census' and
lag (flag) over (order by machine_id, ss2k) != 'census'
then 1
else 0
end as block
from foo
),
q2 as (
select
machine_id, recorded, ss2k, flag, date,
sum (block) over (order by machine_id, ss2k) as group_id,
case when flag = 'census' then 1 else 0 end as census
from q1
),
q3 as (
select
machine_id, recorded, ss2k, flag, date, group_id,
sum (census) over (partition by group_id order by ss2k) as max_count
from q2
),
groups as (
select group_id
from q3
group by group_id
having max (max_count) >= 5
)
select
q2.machine_id, q2.recorded, q2.ss2k, q2.flag, q2.date
from
q2
join groups g on q2.group_id = g.group_id
where
q2.flag = 'census'
如果您使用子句单独运行
中的每个查询,我想您会看到这是如何发展的。@Gordon Linoff谢谢。到目前为止您做了哪些尝试?我们通常喜欢帮助别人,而不仅仅是为他们做任何事情——除非事情相当简单。@ADyson我只是编辑我的帖子。哇,,,@D你真聪明。我不得不说。让我在我的数据上试一下代码。我用窗口函数做这种逻辑已经有将近20年了:-)嘿@dnoeth,我在我的数据上试过这个代码。似乎当黄色普查区的长度为5时,代码将只返回第一行和第五行。例如,在第一个黄色普查区块中,仅返回第一行和第五行。是否有方法返回块中的所有行?谢谢,没错。我认为使用这种方法无法获得预期的结果。等等……嗨@dnoeth,谢谢你所做的一切。但这段代码似乎返回了数据中几乎所有的普查行,而不是仅返回黄色块中的普查行。我在我的数据中尝试了代码。。