在SQL Bigquery中,如何在另一个事件之前计算特定事件的数量?

在SQL Bigquery中,如何在另一个事件之前计算特定事件的数量?,sql,google-bigquery,legacy-sql,Sql,Google Bigquery,Legacy Sql,我有一个包含日期、事件和用户的表。有一个名为“A”的事件。我想知道一个特定事件在Sql Bigquery中的事件“a”之前和之后发生了多少次。比如说, User Date Events 123 2018-02-13 X.Y.A 123 2018-02-12 X.Y.B 134 2018-02-10 Y.Z

我有一个包含日期、事件和用户的表。有一个名为“A”的事件。我想知道一个特定事件在Sql Bigquery中的事件“a”之前和之后发生了多少次。比如说,

   User           Date             Events
    123          2018-02-13            X.Y.A
    123          2018-02-12            X.Y.B
    134          2018-02-10            Y.Z.A
    123          2018-02-11            A
    123          2018-02-01            X.Y.Z
    134          2018-02-05            X.Y.B
    134          2018-02-04            A
输出应该是这样的

User       Event    Before   After
123          A      1        3
134          A      0        1

我必须计数的事件包含一个特定的前缀。意味着我必须检查以X.Y.开头的事件,然后是一些事件名称。所以,X.Y.SomeEvent是我必须设置计数器的事件。有什么建议吗?

用户窗口功能用于查找故障发生的日期。然后,使用条件聚合对以下事件前后的事件进行计数:

select userid,
       sum(case when date < a_date and event like 'X.Y%' then 1 else 0 end) as before,
       sum(case when date > a_date and event like 'X.Y%' then 1 else 0 end) as before
from (select t.*,
             min(case when event = 'A' then date end) over (partition by userid) as a_date
      from t
     ) t
group by userid

用户窗口用于查找发生故障的日期。然后,使用条件聚合对以下事件前后的事件进行计数:

select userid,
       sum(case when date < a_date and event like 'X.Y%' then 1 else 0 end) as before,
       sum(case when date > a_date and event like 'X.Y%' then 1 else 0 end) as before
from (select t.*,
             min(case when event = 'A' then date end) over (partition by userid) as a_date
      from t
     ) t
group by userid

下面是BigQuerySQL的示例

#standardSQL
SELECT user, event, before, after 
FROM (
  SELECT user, event, 
    COUNTIF(event LIKE 'X.Y.%') OVER(PARTITION BY user ORDER BY dt ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) before,
    COUNTIF(event LIKE 'X.Y.%') OVER(PARTITION BY user ORDER BY dt ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING ) after
  FROM `project.dataset.events`
)
WHERE event = 'A'
-- ORDER BY user
您可以在问题中使用虚拟数据进行测试

#standardSQL
WITH `project.dataset.events` AS (
  SELECT 123 user, '2018-02-13' dt, 'X.Y.A' event UNION ALL
  SELECT 123, '2018-02-12', 'X.Y.B' UNION ALL
  SELECT 123, '2018-02-11', 'A' UNION ALL
  SELECT 134, '2018-02-10', 'Y.Z.A' UNION ALL
  SELECT 134, '2018-02-05', 'X.Y.B' UNION ALL
  SELECT 134, '2018-02-04', 'A' UNION ALL
  SELECT 123, '2018-02-01', 'X.Y.Z' 
)
SELECT user, event, before, after 
FROM (
  SELECT user, event, 
    COUNTIF(event LIKE 'X.Y.%') OVER(PARTITION BY user ORDER BY dt ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) before,
    COUNTIF(event LIKE 'X.Y.%') OVER(PARTITION BY user ORDER BY dt ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING ) after
  FROM `project.dataset.events`
)
WHERE event = 'A'
ORDER BY user

下面是BigQuerySQL的示例

#standardSQL
SELECT user, event, before, after 
FROM (
  SELECT user, event, 
    COUNTIF(event LIKE 'X.Y.%') OVER(PARTITION BY user ORDER BY dt ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) before,
    COUNTIF(event LIKE 'X.Y.%') OVER(PARTITION BY user ORDER BY dt ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING ) after
  FROM `project.dataset.events`
)
WHERE event = 'A'
-- ORDER BY user
您可以在问题中使用虚拟数据进行测试

#standardSQL
WITH `project.dataset.events` AS (
  SELECT 123 user, '2018-02-13' dt, 'X.Y.A' event UNION ALL
  SELECT 123, '2018-02-12', 'X.Y.B' UNION ALL
  SELECT 123, '2018-02-11', 'A' UNION ALL
  SELECT 134, '2018-02-10', 'Y.Z.A' UNION ALL
  SELECT 134, '2018-02-05', 'X.Y.B' UNION ALL
  SELECT 134, '2018-02-04', 'A' UNION ALL
  SELECT 123, '2018-02-01', 'X.Y.Z' 
)
SELECT user, event, before, after 
FROM (
  SELECT user, event, 
    COUNTIF(event LIKE 'X.Y.%') OVER(PARTITION BY user ORDER BY dt ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) before,
    COUNTIF(event LIKE 'X.Y.%') OVER(PARTITION BY user ORDER BY dt ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING ) after
  FROM `project.dataset.events`
)
WHERE event = 'A'
ORDER BY user

嘿,戈登,我在运行查询时遇到了这个错误。你能告诉我我做错了什么吗?错误:SELECT子句混合了聚合“a_date”和字段“user、name、EventDate”,但没有GROUP BY子句。我在运行查询时遇到此错误。你能告诉我我做错了什么吗?错误:SELECT子句混合了聚合“a_date”和字段“user、name、EventDate”,而没有克劳塞赫·米哈伊尔(Clausehy Mikhail)的分组,谢谢,这对我很有效。我只想知道一件事。这里我们假设A只发生一次。如果一个用户的事件发生多次怎么办?我们如何确保同一个事件X.Y不计入另一个事件A?在您之前的问题中,您指出每个用户只有一次事件A,因此上述解决方案利用了这一事实。是的,但我正在尝试了解这一点,并希望知道如何实现这一点。明白。我建议你发布你的新问题,这样我们就不受评论格式的限制:oHey Mikhail,谢谢,这对我很有效。我只想知道一件事。这里我们假设A只发生一次。如果一个用户的事件发生多次怎么办?我们如何确保同一个事件X.Y不计入另一个事件A?在您之前的问题中,您指出每个用户只有一次事件A,因此上述解决方案利用了这一事实。是的,但我正在尝试了解这一点,并希望知道如何实现这一点。明白。我建议你发布你的新问题,这样我们就不受评论格式的限制:o