使用PostgreSQL查询生成包含每日统计信息的时间序列

使用PostgreSQL查询生成包含每日统计信息的时间序列,sql,postgresql,time-series,Sql,Postgresql,Time Series,我发现自己不得不(对我来说)制定一个相当复杂的SQL查询,而我似乎无法理解它 我有一个名为orders的表和一个相关的表order\u state\u history,记录这些订单随时间变化的状态(见下文) 现在,我需要生成一系列行(每天一行),其中包含当天结束时处于特定状态的订单数量(请参见报告)。此外,我只想考虑 Orth.Type=1 < < /P>的命令。 数据驻留在PostgreSQL数据库中。我已经了解了如何使用生成时间序列(日期'2001-01-01',当前日期'1天'::间隔)

我发现自己不得不(对我来说)制定一个相当复杂的SQL查询,而我似乎无法理解它

我有一个名为
orders
的表和一个相关的表
order\u state\u history
,记录这些订单随时间变化的状态(见下文)

现在,我需要生成一系列行(每天一行),其中包含当天结束时处于特定状态的订单数量(请参见
报告
)。此外,我只想考虑<代码> Orth.Type=1 < < /P>的命令。 数据驻留在PostgreSQL数据库中。我已经了解了如何使用
生成时间序列(日期'2001-01-01',当前日期'1天'::间隔)天来生成时间序列
,它允许我为未记录状态更改的天生成行

我目前的方法是将
订单
订单状态
和生成的
天数序列
结合在一起,并尝试过滤掉所有具有
日期(订单状态・历史.时间戳)>日期(天数)
的行,然后通过
第一个值以某种方式获得当天每个订单的最终状态(order_state_history.new_state)OVER(PARTITION_BY(orders.id)order BY order_state_history.timestamp DESC)
,但这正是我一点点SQL经验抛弃我的地方

我就是想不起这个问题

这甚至可以在一个查询中解决,还是建议我使用某种每天执行一个查询的智能脚本来计算数据更好? 解决这个问题的合理方法是什么

orders===            
id       type        
10000    1        
10001    1        
10002    2        
10003    2        
10004    1        


order_state_history===            
order_id    index    timestamp           new_state
10000       1        01.01.2001 12:00    NEW
10000       2        02.01.2001 13:00    ACTIVE
10000       3        03.01.2001 14:00    DONE
10001       1        02.01.2001 13:00    NEW
10002       1        03.01.2001 14:00    NEW
10002       2        05.01.2001 10:00    ACTIVE
10002       3        05.01.2001 14:00    DONE
10003       1        07.01.2001 04:00    NEW
10004       1        05.01.2001 14:00    NEW
10004       2        10.01.2001 17:30    DONE


Expected result===            
date          new_orders    active_orders    done_orders
01.01.2001    1             0                0
02.01.2001    1             1                0
03.01.2001    1             0                1
04.01.2001    1             0                1
05.01.2001    2             0                1
06.01.2001    2             0                1
07.01.2001    2             0                1
08.01.2001    2             0                1
09.01.2001    2             0                1
10.01.2001    1             0                2

步骤1.使用值NEW=1、ACTIVE=1、DONE=2,计算每个订单的状态累积总和:

select 
    order_id, timestamp::date as day, 
    sum(case new_state when 'DONE' then 2 else 1 end) over w as state
from order_state_history h
join orders o on o.id = h.order_id
where o.type = 1
window w as (partition by order_id order by timestamp)

 order_id |    day     | state 
----------+------------+-------
    10000 | 2001-01-01 |     1
    10000 | 2001-01-02 |     2
    10000 | 2001-01-03 |     4
    10001 | 2001-01-02 |     1
    10004 | 2001-01-05 |     1
    10004 | 2001-01-10 |     3
(6 rows)
步骤2.根据步骤1中的状态计算每个订单的转移矩阵(2表示新建->激活,3表示新建->完成,4表示激活->完成):

步骤3.计算一系列天内每个状态的累计总和:

select distinct
    day::date,
    sum(new) over w as new,
    sum(active) over w as active,
    sum(done) over w as done
from generate_series('2001-01-01'::date, '2001-01-10', '1d'::interval) day
left join (
    select 
        order_id, day, state,
        case when state = 1 then 1 when state = 2 or state = 3 then -1 else 0 end as new,
        case when state = 2 then 1 when state = 4 then -1 else 0 end as active,
        case when state > 2 then 1 else 0 end as done
    from (
        select 
            order_id, timestamp::date as day, 
            sum(case new_state when 'DONE' then 2 else 1 end) over w as state
        from order_state_history h
        join orders o on o.id = h.order_id
        where o.type = 1
        window w as (partition by order_id order by timestamp)
        ) s
    ) s
using(day)
window w as (order by day)
order by 1

    day     | new | active | done 
------------+-----+--------+------
 2001-01-01 |   1 |      0 |    0
 2001-01-02 |   1 |      1 |    0
 2001-01-03 |   1 |      0 |    1
 2001-01-04 |   1 |      0 |    1
 2001-01-05 |   2 |      0 |    1
 2001-01-06 |   2 |      0 |    1
 2001-01-07 |   2 |      0 |    1
 2001-01-08 |   2 |      0 |    1
 2001-01-09 |   2 |      0 |    1
 2001-01-10 |   1 |      0 |    2
(10 rows)   

步骤1.使用值NEW=1、ACTIVE=1、DONE=2,计算每个订单的状态累积总和:

select 
    order_id, timestamp::date as day, 
    sum(case new_state when 'DONE' then 2 else 1 end) over w as state
from order_state_history h
join orders o on o.id = h.order_id
where o.type = 1
window w as (partition by order_id order by timestamp)

 order_id |    day     | state 
----------+------------+-------
    10000 | 2001-01-01 |     1
    10000 | 2001-01-02 |     2
    10000 | 2001-01-03 |     4
    10001 | 2001-01-02 |     1
    10004 | 2001-01-05 |     1
    10004 | 2001-01-10 |     3
(6 rows)
步骤2.根据步骤1中的状态计算每个订单的转移矩阵(2表示新建->激活,3表示新建->完成,4表示激活->完成):

步骤3.计算一系列天内每个状态的累计总和:

select distinct
    day::date,
    sum(new) over w as new,
    sum(active) over w as active,
    sum(done) over w as done
from generate_series('2001-01-01'::date, '2001-01-10', '1d'::interval) day
left join (
    select 
        order_id, day, state,
        case when state = 1 then 1 when state = 2 or state = 3 then -1 else 0 end as new,
        case when state = 2 then 1 when state = 4 then -1 else 0 end as active,
        case when state > 2 then 1 else 0 end as done
    from (
        select 
            order_id, timestamp::date as day, 
            sum(case new_state when 'DONE' then 2 else 1 end) over w as state
        from order_state_history h
        join orders o on o.id = h.order_id
        where o.type = 1
        window w as (partition by order_id order by timestamp)
        ) s
    ) s
using(day)
window w as (order by day)
order by 1

    day     | new | active | done 
------------+-----+--------+------
 2001-01-01 |   1 |      0 |    0
 2001-01-02 |   1 |      1 |    0
 2001-01-03 |   1 |      0 |    1
 2001-01-04 |   1 |      0 |    1
 2001-01-05 |   2 |      0 |    1
 2001-01-06 |   2 |      0 |    1
 2001-01-07 |   2 |      0 |    1
 2001-01-08 |   2 |      0 |    1
 2001-01-09 |   2 |      0 |    1
 2001-01-10 |   1 |      0 |    2
(10 rows)   

请检查预期结果(为什么03.01有两个新订单?),并至少在05.01之前添加下一个预期行。我在03.01添加了所有相关行。有两个新订单,因为在02.01和03.01上都创建了新订单(10001和10002).订单10001保持为新状态,因此在接下来的所有天都会被计算。计数为总计,结果行<代码>新订单统计当天结束时处于新状态的所有订单,无论其状态是否更改。但10002属于类型2,因此不应被计算?您当然是对的。我已相应地更新了数据。请,检查预期结果(为什么在03.01有两个新订单?),并至少在05.01之前添加下一个预期行。我在03.01添加了所有相关行。有两个新订单,因为在02.01和03.01上都创建了新订单(10001和10002).订单10001保持为新状态,因此在接下来的所有天都会进行计数。计数为总计,结果行<代码>新订单统计当天结束时处于新状态的所有订单,无论其状态是否更改。但10002属于类型2,因此不应进行计数?您当然是对的。我已相应地更新了数据。