Sql 缺少值的BIGQUERY移动平均值
我有以下数据Sql 缺少值的BIGQUERY移动平均值,sql,google-bigquery,moving-average,Sql,Google Bigquery,Moving Average,我有以下数据 with dummy_data as ( SELECT '2017-01-01' as ref_month, 18 as value, 1 as id UNION ALL SELECT '2017-02-01' as ref_month, 20 as value, 1 as id UNION ALL SELECT '2017-03-01' as ref_month, 22 as value, 1 as id -- UNION ALL SELECT '2017-04-01' as
with dummy_data as
(
SELECT '2017-01-01' as ref_month, 18 as value, 1 as id
UNION ALL SELECT '2017-02-01' as ref_month, 20 as value, 1 as id
UNION ALL SELECT '2017-03-01' as ref_month, 22 as value, 1 as id
-- UNION ALL SELECT '2017-04-01' as ref_month, 28 as value, 1 as id
UNION ALL SELECT '2017-05-01' as ref_month, 30 as value, 1 as id
UNION ALL SELECT '2017-06-01' as ref_month, 37 as value, 1 as id
UNION ALL SELECT '2017-07-01' as ref_month, 42 as value, 1 as id
-- UNION ALL SELECT '2017-08-01' as ref_month, 55 as value, 1 as id
-- UNION ALL SELECT '2017-09-01' as ref_month, 49 as value, 1 as id
UNION ALL SELECT '2017-10-01' as ref_month, 51 as value, 1 as id
UNION ALL SELECT '2017-11-01' as ref_month, 57 as value, 1 as id
UNION ALL SELECT '2017-12-01' as ref_month, 56 as value, 1 as id
UNION ALL SELECT '2017-01-01' as ref_month, 18 as value, 2 as id
UNION ALL SELECT '2017-02-01' as ref_month, 20 as value, 2 as id
UNION ALL SELECT '2017-03-01' as ref_month, 22 as value, 2 as id
UNION ALL SELECT '2017-04-01' as ref_month, 28 as value, 2 as id
-- UNION ALL SELECT '2017-05-01' as ref_month, 30 as value, 2 as id
-- UNION ALL SELECT '2017-06-01' as ref_month, 37 as value, 2 as id
UNION ALL SELECT '2017-07-01' as ref_month, 42 as value, 2 as id
UNION ALL SELECT '2017-08-01' as ref_month, 55 as value, 2 as id
UNION ALL SELECT '2017-09-01' as ref_month, 49 as value, 2 as id
-- UNION ALL SELECT '2017-10-01' as ref_month, 51 as value, 2 as id
UNION ALL SELECT '2017-11-01' as ref_month, 57 as value, 2 as id
UNION ALL SELECT '2017-12-01' as ref_month, 56 as value, 2 as id
)
我想计算每个id的移动平均值。我知道你可以做如下的事情
select
id
, ref_month
, avg(value) over (partition by id order by ref_month ROWS BETWEEN 5 PRECEDING AND CURRENT ROW ) as moving_avg
from
dummy_data
但正如您从我的虚拟数据中看到的,有一些缺少的值。
当缺少一些值时,如何轻松计算移动平均线?
我想先计算一个完整的日期范围
date_range AS
(
SELECT reference_month
FROM UNNEST(
GENERATE_DATE_ARRAY(PARSE_DATE('%Y-%m-%d', (SELECT MIN(ref_month) FROM dummy_data)), PARSE_DATE('%Y-%m-%d', (SELECT MAX(ref_month) FROM dummy_data)), INTERVAL 1 MONTH)
) AS reference_month
)
然后用ID做笛卡尔积,然后用我的虚拟数据连接回来,但这似乎是一种反模式。有没有关于如何以最佳方式实现这一点的想法?
谢谢
编辑:
预期结果:
对于id 1:
2017-01-01 18
2017-02-01 19
2017-03-01 20
2017-05-01 18
2017-06-01 21.8
2017-07-01 26.2
2017-10-01 26
2017-11-01 30
2017-12-01 32.8
对于id 2:
2017-01-01 18
2017-02-01 19
2017-03-01 20
2017-04-01 22
2017-07-01 18.4
2017-08-01 25
2017-09-01 29.2
2017-11-01 40.6
2017-12-01 43.4
这应该起作用:
with dummy_data as
(
SELECT '2017-01-01' as ref_month, 18 as value, 1 as id
UNION ALL SELECT '2017-02-01' as ref_month, 20 as value, 1 as id
UNION ALL SELECT '2017-03-01' as ref_month, 22 as value, 1 as id
-- UNION ALL SELECT '2017-04-01' as ref_month, 28 as value, 1 as id
UNION ALL SELECT '2017-05-01' as ref_month, 30 as value, 1 as id
UNION ALL SELECT '2017-06-01' as ref_month, 37 as value, 1 as id
UNION ALL SELECT '2017-07-01' as ref_month, 42 as value, 1 as id
-- UNION ALL SELECT '2017-08-01' as ref_month, 55 as value, 1 as id
-- UNION ALL SELECT '2017-09-01' as ref_month, 49 as value, 1 as id
UNION ALL SELECT '2017-10-01' as ref_month, 51 as value, 1 as id
UNION ALL SELECT '2017-11-01' as ref_month, 57 as value, 1 as id
UNION ALL SELECT '2017-12-01' as ref_month, 56 as value, 1 as id
UNION ALL SELECT '2017-01-01' as ref_month, 18 as value, 2 as id
UNION ALL SELECT '2017-02-01' as ref_month, 20 as value, 2 as id
UNION ALL SELECT '2017-03-01' as ref_month, 22 as value, 2 as id
UNION ALL SELECT '2017-04-01' as ref_month, 28 as value, 2 as id
-- UNION ALL SELECT '2017-05-01' as ref_month, 30 as value, 2 as id
-- UNION ALL SELECT '2017-06-01' as ref_month, 37 as value, 2 as id
UNION ALL SELECT '2017-07-01' as ref_month, 42 as value, 2 as id
UNION ALL SELECT '2017-08-01' as ref_month, 55 as value, 2 as id
UNION ALL SELECT '2017-09-01' as ref_month, 49 as value, 2 as id
-- UNION ALL SELECT '2017-10-01' as ref_month, 51 as value, 2 as id
UNION ALL SELECT '2017-11-01' as ref_month, 57 as value, 2 as id
UNION ALL SELECT '2017-12-01' as ref_month, 56 as value, 2 as id
)
select
id
, ref_month
, avg(avg(value)) over (partition by id order by ref_month) as moving_avg
from
dummy_data
group by id
, ref_month
如果希望将值视为0,而希望为5,则一系列滞后可能是最简单的方法:
select id, ref_month,
(value +
(case when lag(ref_month) over (partition by id order by ref_month) > date_add(ref_month, interval -4 month)
then lag(value, 1) over (partition by id order by ref_month)
else 0
end) +
(case when lag(ref_month, 2) over (partition by id order by ref_month) > date_add(ref_month, interval -4 month)
then lag(value, 2) over (partition by id order by ref_month)
else 0
end) +
(case when lag(ref_month, 3) over (partition by id order by ref_month) > date_add(ref_month, interval -4 month)
then lag(value, 3) over (partition by id order by ref_month)
else 0
end) +
(case when lag(ref_month, 4) over (partition by id order by ref_month) > date_add(ref_month, interval -4 month)
then lag(value, 4) over (partition by id order by ref_month)
else 0
end)
) /
least(5, date_diff(min(ref_month) over (partition by id), ref_month))
from dummy_data;
查询比逻辑更复杂。它基本上是将最近的五个值除以5相加。但它会影响边界条件以及缺少的值。下面是针对BigQuery标准SQL的,并且实际有效!:o 它假设您的ref_month是日期数据类型,如果您将其作为字符串-仍然可以-请参阅我答案底部的注释 标准SQL 选择 身份证件 参考月, 滚动六天的总价值/ 最后一个月滚动六天的位置 -首个月滚动六天的位置 + 1 作为正确的\u移动\u平均值 从…起 选择id、参考月份、值、, 日期/月份,'2016-01-01',月份/月份位置 从虚拟数据 窗口滚动六天 按id顺序按月份分区\u位置范围在前5行和当前行之间 您可以使用下面的示例数据测试/使用它 标准SQL 使用虚拟_数据作为 选择日期“2017-01-01”作为参考月,选择18作为值,选择1作为id UNION ALL选择日期“2017-02-01”作为参考月,选择20作为值,选择1作为id UNION ALL选择日期“2017-03-01”作为参考月,选择22作为值,选择1作为id -UNION ALL选择日期“2017-04-01”作为参考月,选择28作为值,选择1作为id UNION ALL选择日期“2017-05-01”作为参考月,选择30作为值,选择1作为id UNION ALL选择日期“2017-06-01”作为参考月,选择37作为值,选择1作为id UNION ALL选择日期“2017-07-01”作为参考月,选择42作为值,选择1作为id -UNION ALL选择日期“2017-08-01”作为参考月,55作为值,1作为id -UNION ALL选择日期“2017-09-01”作为参考月,49作为值,1作为id UNION ALL选择日期“2017-10-01”作为参考月,51作为值,1作为id UNION ALL选择日期“2017-11-01”作为参考月,选择57作为值,选择1作为id UNION ALL选择日期“2017-12-01”作为参考月,选择56作为值,选择1作为id UNION ALL选择日期“2017-01-01”作为参考月,18作为值,2作为id UNION ALL选择日期“2017-02-01”作为参考月,选择20作为值,选择2作为id UNION ALL选择日期“2017-03-01”作为参考月,选择22作为值,选择2作为id UNION ALL选择日期“2017-04-01”作为参考月,选择28作为值,选择2作为id -UNION ALL选择日期“2017-05-01”作为参考月,选择30作为值,选择2作为id -UNION ALL选择日期“2017-06-01”作为参考月,选择37作为值,选择2作为id UNION ALL选择日期“2017-07-01”作为参考月,选择42作为值,选择2作为id UNION ALL选择日期“2017-08-01”作为参考月,55作为值,2作为id UNION ALL选择日期“2017-09-01”作为参考月,49作为值,2作为id -UNION ALL选择日期“2017-10-01”作为参考月,51作为值,2作为id UNION ALL选择日期“2017-11-01”作为参考月,选择57作为值,选择2作为id UNION ALL选择日期“2017-12-01”作为参考月,选择56作为值,选择2作为id 选择 身份证件 参考月, 滚动六天的总价值/ 最后一个月滚动六天的位置 -首个月滚动六天的位置 + 1 作为正确的\u移动\u平均值 从…起 选择id、参考月份、值、, 日期/月份,'2016-01-01',月份/月份位置 从虚拟数据 窗口滚动六天,按id按顺序按月份分区位置范围在前5行和当前行之间 以1,2的顺序排列 为了帮助您探索逻辑-请参阅下面的上述查询的扩展版本-它甚至所有中间值都传播到非常外部的select,以便您可以查看所有内容 标准SQL 使用虚拟_数据作为 选择日期“2017-01-01”作为参考月,选择18作为值,选择1作为id UNION ALL选择日期“2017-02-01”作为参考月,选择20作为值,选择1作为id UNION ALL选择日期“2017-03-01”作为参考月,选择22作为值,选择1作为id -UNION ALL选择日期“2017-04-01”作为参考月,选择28作为值,选择1作为id UNION ALL选择日期“2017-05-01”作为参考月,选择30作为值,选择1作为id UNION ALL选择日期“2017-06-01”作为参考月,选择37作为值,选择1作为id UNION ALL选择日期“2017-07-01”作为参考月,选择42作为值,选择1作为id -UNION ALL选择日期“2017-08-01”作为参考月,55作为值,1作为id -UNION ALL选择日期“2017-09-01”作为参考月,49作为值,1作为id UNION ALL选择日期“2017-10-01”作为参考月,51作为值,1作为id UNION ALL选择日期“2017-11-01”作为参考月,选择57作为值,选择1作为id UNION ALL选择日期“2017-12-01”作为参考月,选择56作为值,选择1作为id 联合所有选择日期