Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Google bigquery Bigquery-窗口聚合同比_Google Bigquery - Fatal编程技术网

Google bigquery Bigquery-窗口聚合同比

Google bigquery Bigquery-窗口聚合同比,google-bigquery,Google Bigquery,我试图使用一个窗口函数,为sku的每一天销售,得到sku最后365天的数量总和。如果这是每天出售,那么我可以使用行和前面的等 ORDER BY CalendarFullDate ROWS BETWEEN 364 PRECEDING AND CURRENT ROW 但在这种情况下,日期分布不均匀,很多天没有销售(即我不能回到364行,假设每天都有销售) 因此,对于下面的测试/示例,是否可以使用窗口和某种类型的where子句,这样我最多只能求和364天 WITH samples AS

我试图使用一个窗口函数,为sku的每一天销售,得到sku最后365天的数量总和。如果这是每天出售,那么我可以使用行和前面的等

ORDER BY
      CalendarFullDate ROWS BETWEEN 364 PRECEDING AND CURRENT ROW
但在这种情况下,日期分布不均匀,很多天没有销售(即我不能回到364行,假设每天都有销售)

因此,对于下面的测试/示例,是否可以使用窗口和某种类型的where子句,这样我最多只能求和364天

WITH samples AS (
  SELECT "1" AS SKU, DATE("2018-10-27") AS CalendarFullDate, 86.0 AS DailySalesQty UNION ALL (
  SELECT "1" AS SKU, DATE("2018-10-20"), 84.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2018-09-29"), 88.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2018-09-14"), 42.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2018-09-01"), 21.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2018-05-05"), 25.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2018-04-28"), 97.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2018-03-31"), 244.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2018-03-24"), 68.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2018-02-23"), 52.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2018-02-10"), 48.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2018-01-21"), 243.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2018-01-18"), 2.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2018-01-06"), 190.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2017-12-26"), 310.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2017-12-09"), 240.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2017-11-03"), 30.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2017-10-21"), 164.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2017-09-30"), 44.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2017-09-09"), 55.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2017-09-01"), 35.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2017-05-20"), 60.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2017-05-06"), 68.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2017-04-15"), 136.0) UNION ALL (

  SELECT "2" AS SKU, DATE("2018-10-24"), 46.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2018-10-18"), 56.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2018-09-16"), 19.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2018-09-02"), 42.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2018-09-01"), 45.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2018-07-05"), 25.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2018-06-28"), 210.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2018-05-31"), 44.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2018-05-24"), 168.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2018-04-23"), 152.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2018-03-10"), 8.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2018-02-21"), 23.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2018-01-18"), 20.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2018-01-06"), 10.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2017-12-26"), 30.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2017-11-09"), 1240.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2017-11-03"), 323.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2017-10-21"), 123.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2017-09-30"), 444.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2017-09-09"), 555.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2017-08-01"), 35.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2017-06-20"), 6.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2017-05-06"), 68.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2017-04-15"), 136.0) UNION ALL (
  SELECT "2" AS SKU, DATE("2017-04-09"), 136.0)
)

SELECT 
  SKU, 
  CalendarFullDate, 
  SUM(DailySalesQty) OVER(win)
FROM
  samples WINDOW win AS (
    PARTITION BY
      SKU
    ORDER BY
      CalendarFullDate 
    RANGE BETWEEN DATE_TRUNC(CalendarFullDate,INTERVAL 364 DAY) AND CalendarFullDate)
我知道上面的代码不能用于范围,但它是一种伪代码,用于我真正想要做的事情。我尝试了where条款,但那是不允许的

这甚至可以使用窗口吗?这是一个很好的干净的方法,但不确定我是否可以为窗口聚合表达这样一个条件

注意:这是一个真实数据的精简版本,它有5个字段作为分区,还有20多个要聚合的度量值,是一个巨大的数据集(1 TB),因此希望它也是高效的

想法


干杯

下面是BigQuery标准SQL

#standardSQL
SELECT 
    SKU, 
    CalendarFullDate,
    SUM(DailySalesQty) OVER(win) SalesQty365days
FROM (
  SELECT 
    SKU, 
    CalendarFullDate, 
    DailySalesQty,
    UNIX_DATE(CalendarFullDate) unix_days
  FROM samples 
)
WINDOW win AS (
  PARTITION BY SKU ORDER BY unix_days 
  RANGE BETWEEN 364 PRECEDING AND CURRENT ROW
)

这里的诀窍是将日期类型的CalendarFullDate字段“转换”为自纪元起的整数天,以便可以按顺序使用,并且窗口表达式的范围部分用于BigQuery标准SQL

#standardSQL
SELECT 
    SKU, 
    CalendarFullDate,
    SUM(DailySalesQty) OVER(win) SalesQty365days
FROM (
  SELECT 
    SKU, 
    CalendarFullDate, 
    DailySalesQty,
    UNIX_DATE(CalendarFullDate) unix_days
  FROM samples 
)
WINDOW win AS (
  PARTITION BY SKU ORDER BY unix_days 
  RANGE BETWEEN 364 PRECEDING AND CURRENT ROW
)

这里的诀窍是将日期类型的CalendarFullDate字段“转换”为自纪元起的整数天,以便可以按窗口表达式的顺序和范围部分使用。再次感谢!热爱你的工作。克苏尔。“全局别名”问题的任何更新-对你有用吗?有用吗。再次感谢!热爱你的工作。克苏尔。关于“全局别名”问题的任何更新-对您有效吗?