Sql 如何插入非分组数据

Sql 如何插入非分组数据,sql,postgresql,group-by,Sql,Postgresql,Group By,受此启发,我编写了以下查询,返回去年按5分钟间隔计算的平均值 我想要的是所有的5分钟间隔,如果没有任何行适合特定的时间跨度,则设置为null with intervals as (select (select min("timestamp") from public.hst_energy_d) + n AS start_timestamp, (select min("timestamp") from public

受此启发,我编写了以下查询,返回去年按5分钟间隔计算的平均值

我想要的是所有的5分钟间隔,如果没有任何行适合特定的时间跨度,则设置为null

with intervals as (select
                     (select min("timestamp") from public.hst_energy_d) + n AS start_timestamp,
                     (select min("timestamp") from public.hst_energy_d) + n + 299 AS end_timestamp
                   from generate_series(extract(epoch from now())::BIGINT - 10596096000, extract(epoch from now())::BIGINT, 300) n)
(SELECT AVG(meas."Al1") as "avg", islots.start_timestamp AS "timestamp"
FROM public.hst_energy_d meas
  RIGHT OUTER JOIN intervals islots
    on meas.timestamp >= islots.start_timestamp and meas.timestamp <= islots.end_timestamp
WHERE
  meas.idinstrum = 4
  AND
  meas.id_device = 122
  AND
  meas.timestamp > extract(epoch from now()) - 10596096000
GROUP BY islots.start_timestamp, islots.end_timestamp
ORDER BY timestamp);

我认为这个职位适合你


这是一种对日期进行分组的方法,我建议您构建一个标量函数。

我想我明白了您的意图,我想知道自由使用间隔“5分钟”是否是一种更好、更容易遵循的方法:

with times as (  -- find the first date in the dataset, up to today
  select
    date_trunc ('minutes', min("timestamp")) - 
    mod (extract ('minutes' from min("timestamp"))::int, 5) * interval '1 minute' as bt,
    date_trunc ('minutes', current_timestamp) - 
    mod (extract ('minutes' from current_timestamp)::int, 5) * interval '1 minute' as et
  from hst_energy_d
  where
    idinstrum = 4 and
    id_device = 122
), -- generate every possible range between these dates
ranges as (
  select
    generate_series(bt, et, interval '5 minutes') as range_start
  from times
), -- normalize your data to which 5-minut interval it belongs to
rounded_hst as (
  select
    date_trunc ('minutes', "timestamp") - 
    mod (extract ('minutes' from "timestamp")::int, 5) * interval '1 minute' as round_time,
    *
  from hst_energy_d
  where
    idinstrum = 4 and
    id_device = 122  
)
select
  r.range_start, r.range_start + interval '5 minutes' as range_end,
  avg (hd."Al1")
from
  ranges r
  left join rounded_hst hd on
    r.range_start = hd.round_time
group by
  r.range_start
order by
  r.range_start
顺便说一句,敏锐的眼睛可能想知道为什么要为CTE rounded_hst而烦恼,为什么不在连接中使用中间值。根据我所测试和观察到的一切,数据库将爆炸出所有的可能性,然后测试where子句中的between条件——过滤笛卡尔。对于这么多的间隔,这肯定是一个杀手

将每个数据截短到最接近的五分钟允许标准SQL联接。我鼓励你们两个都测试一下,我想你们会明白我的意思的

-编辑日期:2016年11月17日-

OP中考虑时间的解决方案是数字,而不是日期:

with times as (  -- find the first date in the dataset, up to today
    select
      date_trunc('minutes', to_timestamp(min("timestamp"))::timestamp) -
      mod(extract ('minutes' from to_timestamp(min("timestamp"))::timestamp)::int, 5) * interval '1 minute' as bt,
      date_trunc('minutes', current_timestamp::timestamp) -
      mod(extract ('minutes' from (current_timestamp)::timestamp)::int, 5) * interval '1 minute' as et
    from hst_energy_d
    where
      idinstrum = 4 and
      id_device = 122
), -- generate every possible range between these dates
    ranges as (
      select
        generate_series(bt, et, interval '5 minutes') as range_start
      from times
  ), -- normalize your data to which 5-minute interval it belongs to
    rounded_hst as (
      select
        date_trunc ('minutes', to_timestamp("timestamp")::timestamp)::timestamp -
        mod (extract ('minutes' from (to_timestamp("timestamp")::timestamp))::int, 5) * interval '1 minute' as round_time,
        *
      from hst_energy_d
      where
        idinstrum = 4 and
        id_device = 122
  )
select
  extract('epoch' from r.range_start)::bigint, extract('epoch' from r.range_start + interval '5 minutes')::bigint as range_end,
  avg (hd."Al1")
from
  ranges r
  left join rounded_hst hd on
                             r.range_start = hd.round_time
group by
  r.range_start
order by
  r.range_start;

多么巧妙的提问!不幸的是,timestamp是一个BIGINT而不是一个真正的timestamp,因此我用数字代替timestamp的数学我修改了您建议的代码以使用BIGINT,但我不确定这是否仍然可以利用更轻的SQL。你能看一下并给我一些反馈吗?另外:你谈到了博士后在你的解决方案中表现更好:你是如何分析的?通过使用解释?下面是我的查询执行的步骤。它实际上是在hst_energy_d上执行两次扫描吗?太多了!谢谢你,好吧,解释一下谈话本身:你的解决方案成本为48.47,相比之下,我的成本为247.17Aah,我没有意识到数据类型是数字的。很抱歉。老实说,当你提到“纪元”时,我一直在挠头,但现在它完全讲得通了。我会用你的解决办法修改我的答案。我很高兴这很有帮助。另外,关于解释计划-是的,我知道你在想什么。我知道这样做是因为过去的经验。任何时候我看到中间人加入,我知道会有麻烦。语法很好,很干净,但是执行却很难