Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/sql/74.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
表的SQL操作(聚合和分组)_Sql_Google Bigquery - Fatal编程技术网

表的SQL操作(聚合和分组)

表的SQL操作(聚合和分组),sql,google-bigquery,Sql,Google Bigquery,我想使用bigquery进行每日查询,比较昨天和今天不同指标的总和。示例数据集如下所示: 假设今天是2019年12月23日,查询将汇总不同客户今天12月23日和昨天12月22日的不同指标收入、成本、利润,如果SumDayed/SumDayed不在0.5-1.5的阈值范围内,则将其标记为异常 每天都会进行查询,只需添加新结果即可。理想情况下,最终表格如下所示: WITH unpivoted AS ( SELECT date , 'revenue' A

我想使用bigquery进行每日查询,比较昨天和今天不同指标的总和。示例数据集如下所示:

假设今天是2019年12月23日,查询将汇总不同客户今天12月23日和昨天12月22日的不同指标收入、成本、利润,如果SumDayed/SumDayed不在0.5-1.5的阈值范围内,则将其标记为异常

每天都会进行查询,只需添加新结果即可。理想情况下,最终表格如下所示:

WITH unpivoted AS
(
    SELECT
        date
      , 'revenue'       AS metrics
      , SUM( revenue )  AS amount
      , cust_id
    FROM
        `dataset`
    GROUP
    BY
        date
      , cust_id

    UNION ALL

    SELECT
        date
      , 'cost'          AS metrics
      , SUM( cost )     AS amount
      , cust_id
    FROM
        `dataset`
    GROUP
    BY
        date
      , cust_id
    -- add more desired metrics
)
SELECT
    date as date_generated
  , cust_id
  , metrics
  , SUM( CASE WHEN date = DATE_ADD( CURRENT_DATE() , INTERVAL  0 DAY ) THEN amount END ) AS today
  , SUM( CASE WHEN date = DATE_ADD( CURRENT_DATE() , INTERVAL -1 DAY ) THEN amount END ) AS yesterday
    ...

FROM
    unpivoted
WHERE
    date >= DATE_ADD(CURRENT_DATE(), INTERVAL -1 DAY ) 
AND date <= DATE_ADD(CURRENT_DATE(), INTERVAL  0 DAY ) 

GROUP
BY
    date, cust_id, metrics
我主要关心的是,我能够仅针对一个指标(即收入)执行此操作,但不确定如何应用于所有指标并使查询更高效。这是我写的代码

SELECT cust_id,

SUM(CASE WHEN date = DATE_ADD(CURRENT_DATE(), INTERVAL -1 DAY) 
         THEN revenue
    END) AS sum(yesterday),

SUM(CASE WHEN date = DATE_ADD(CURRENT_DATE(), INTERVAL 0 DAY)
         THEN revenue
    END) AS sum(today),

SUM(CASE WHEN date = DATE_ADD(CURRENT_DATE(), INTERVAL -1 DAY) 
         THEN revenue
    END) / SUM(CASE WHEN date = DATE_ADD(CURRENT_DATE(), INTERVAL 0 DAY)
         THEN revenue
    END) as ratio,

FROM `dataset`
GROUP BY cust_id
代码告诉我:


对于问题不够清晰,我提前表示歉意,因为我对这个问题还不熟悉,不知道如何更准确地表达这个问题。我的建议是将源数据放在Excel数据透视表中。将“值”组移动到行以获得所需的视图

但是,如果您想坚持使用SQL,则需要首先取消对行的分割,将每个度量值放在单独的行中,然后对中间结果进行分组,如下所示:

WITH unpivoted AS
(
    SELECT
        date
      , 'revenue'       AS metrics
      , SUM( revenue )  AS amount
      , cust_id
    FROM
        `dataset`
    GROUP
    BY
        date
      , cust_id

    UNION ALL

    SELECT
        date
      , 'cost'          AS metrics
      , SUM( cost )     AS amount
      , cust_id
    FROM
        `dataset`
    GROUP
    BY
        date
      , cust_id
    -- add more desired metrics
)
SELECT
    date as date_generated
  , cust_id
  , metrics
  , SUM( CASE WHEN date = DATE_ADD( CURRENT_DATE() , INTERVAL  0 DAY ) THEN amount END ) AS today
  , SUM( CASE WHEN date = DATE_ADD( CURRENT_DATE() , INTERVAL -1 DAY ) THEN amount END ) AS yesterday
    ...

FROM
    unpivoted
WHERE
    date >= DATE_ADD(CURRENT_DATE(), INTERVAL -1 DAY ) 
AND date <= DATE_ADD(CURRENT_DATE(), INTERVAL  0 DAY ) 

GROUP
BY
    date, cust_id, metrics

您可以汇总数据,然后使用lag或join输入前几天的数据:

with t as (
      select cust_id, date,
             sum(revenue) as revenue,
             sum(cost) as cost,
             sum(profit) as profit
      from dataset
      where date >= date_add(current_date, interval -1 day)
      group by cust_id, date
     )
select t.cust_id,
       today, yesterday
from t today left join
     t yesterday
     on yesterday.cust_id = today.cust_id and
        yesterday.date = date_add(current_date, interval -1 day)
where today.date = current_date;

可以先取消填充列,然后对结果进行分组。之后,您可能需要使用LAG在同一行中显示一天和前一天的数据

WITH unpivoted AS
(
  SELECT
    date,
    'revenue' AS metrics,
    SUM( revenue ) AS amount,
    cust_id
  FROM
    `dataset`
  GROUP BY
    date, metrics, cust_id

UNION ALL

  SELECT
    date,
    'cost' AS metrics,
    SUM( cost ) AS amount,
    cust_id
  FROM
    `dataset`
  GROUP BY
    date, metrics, cust_id

UNION ALL

  SELECT
    date,
    'profit' AS metrics,
    SUM( profit ) AS amount,
    cust_id
  FROM
    `dataset`
  GROUP BY
    date, metrics, cust_id
)

SELECT
  date as date_generated,
  metrics,
  cust_id,
  LAG(SUM( amount )) OVER (PARTITION BY cust_id, metrics ORDER BY date) yesterday,
  SUM( amount ) AS today,
  LAG(SUM( amount )) OVER (PARTITION BY cust_id, metrics ORDER BY date) / SUM(amount) as ratio,
  CASE WHEN LAG(SUM( amount )) OVER (PARTITION BY cust_id, metrics ORDER BY date) / SUM(amount)<0.5 then 'TRUE' 
      WHEN LAG(SUM( amount )) OVER (PARTITION BY cust_id, metrics ORDER BY date) / SUM(amount)>1.5 then 'TRUE'
      WHEN LAG(SUM( amount )) OVER (PARTITION BY cust_id, metrics ORDER BY date) / SUM(amount) is NULL then 'TRUE'
      ELSE 'FALSE'
  END as anomalous

FROM
  unpivoted

WHERE date >= DATE_ADD(CURRENT_DATE(), INTERVAL -1 DAY ) AND date <= DATE_ADD(CURRENT_DATE(), INTERVAL  0 DAY )    

GROUP BY 
  date_generated, cust_id, metrics

ORDER BY 
  date_generated, metrics, cust_id
请注意,在使用WHERE子句时,我的解决方案仅限于当天和前一天的今天和昨天,因此这可以用于聚合两天以上的度量