Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/sql/81.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/postgresql/10.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Sql 如何返回黄色普查块中的所有行?_Sql_Postgresql - Fatal编程技术网

Sql 如何返回黄色普查块中的所有行?

Sql 如何返回黄色普查块中的所有行?,sql,postgresql,Sql,Postgresql,嘿,模式是这样的:对于整个数据集,我们应该先按机器id排序,然后按ss2k排序。之后,对于每台机器,我们应该找到至少连续5个flag='census'的所有行。在此数据集中,结果应为所有黄色行 我无法使用以下命令返回黄色块的最后4行: drop table if exists qz_panel_census_228_rank; create table qz_panel_census_228_rank as select t.* from (select t.*, co

嘿,模式是这样的:对于整个数据集,我们应该先按机器id排序,然后按ss2k排序。之后,对于每台机器,我们应该找到至少连续5个flag='census'的所有行。在此数据集中,结果应为所有黄色行

我无法使用以下命令返回黄色块的最后4行:

drop table if exists qz_panel_census_228_rank;
create table qz_panel_census_228_rank as
select t.*
from (select t.*,
             count(*) filter (where flag = 'census') over (partition by machine_id, date order by ss2k rows between current row and 4 following) as census_cnt5,
             count(*) filter (where flag = 'census') over (partition by machine_id, date) as count_census,
             row_number() over (partition by machine_id, date order by ss2k) as seqnum,
             count(*) over (partition by machine_id, date) as cnt
      from qz_panel_census_228 t
     ) t
where census_cnt5 = 5 
group by 1,2,3,4,5,6,7,8,9,10,11
DISTRIBUTED BY (machine_id);

您很接近,但需要从两个方向进行搜索:

   select t.*
    from (select t.*,
            case when count(*) filter (where flag = 'census')
                      over (partition by machine_id, date
                            order by ss2k
                            rows between 4 preceding and current row) = 5 
                   or count(*) filter (where flag = 'census') 
                      over (partition by machine_id, date
                            order by ss2k
                            rows between current row and 4 following) = 5
                 then 1
                 else 0
             end  as flag
          from qz_panel_census_228 t
         ) t
    where flag = 1
编辑:

除非您为每个可能的5行窗口添加额外的计数,例如3个前置和1个后续、2个前置和2个后续等,否则此方法将不起作用。这将导致代码丑陋,并且不是很灵活

解决此间隙和孤岛问题的常用方法是首先将连续行分配给公共组:

    select *
    from 
     ( 
        select t2.*,
           count(*) over (partition by machine_id, date, grp) as cnt
        from
          (
           select t1.*
            from (select t.*,
                    -- keep the same number for 'census' rows
                    sum(case when flag = 'census' then 0 else 1 end)
                    over (partition by machine_id, date
                          order by ss2k
                          rows unbounded preceding) as grp
                  from qz_panel_census_228 t
          ) t1
         where flag = 'census' -- only census rows
      ) as t2
) t3
     where cnt >= 5  -- only groups of at least 5 census rows

哇,一定有更好的方法,但我能找到的唯一方法是创建连续的“普查”值块。这看起来很糟糕,但可能是一个更好想法的催化剂

with q1 as (
  select
    machine_id, recorded, ss2k, flag, date,
    case
      when flag = 'census' and
        lag (flag) over (order by machine_id, ss2k) != 'census'
          then 1
      else 0
    end as block
  from foo
),
q2 as (
  select
    machine_id, recorded, ss2k, flag, date,
    sum (block) over (order by machine_id, ss2k) as group_id,
    case when flag = 'census' then 1 else 0 end as census
  from q1
),
q3 as (
  select 
    machine_id, recorded, ss2k, flag, date, group_id,
    sum (census) over (partition by group_id order by ss2k) as max_count
  from q2
),
groups as (
  select group_id
  from q3
  group by group_id
  having max (max_count) >= 5
)
select
  q2.machine_id, q2.recorded, q2.ss2k, q2.flag, q2.date
from
  q2
  join groups g on q2.group_id = g.group_id
where
  q2.flag = 'census'

如果您使用子句单独运行
中的每个查询,我想您会看到这是如何发展的。

@Gordon Linoff谢谢。到目前为止您做了哪些尝试?我们通常喜欢帮助别人,而不仅仅是为他们做任何事情——除非事情相当简单。@ADyson我只是编辑我的帖子。哇,,,@D你真聪明。我不得不说。让我在我的数据上试一下代码。我用窗口函数做这种逻辑已经有将近20年了:-)嘿@dnoeth,我在我的数据上试过这个代码。似乎当黄色普查区的长度为5时,代码将只返回第一行和第五行。例如,在第一个黄色普查区块中,仅返回第一行和第五行。是否有方法返回块中的所有行?谢谢,没错。我认为使用这种方法无法获得预期的结果。等等……嗨@dnoeth,谢谢你所做的一切。但这段代码似乎返回了数据中几乎所有的普查行,而不是仅返回黄色块中的普查行。我在我的数据中尝试了代码。。