Sql 在对BigQuery执行特定操作之前,计算用户类型的操作数

Sql 在对BigQuery执行特定操作之前,计算用户类型的操作数,sql,google-bigquery,Sql,Google Bigquery,我有一个包含用户所做操作日志的表,操作类型为create、confirm和cancel,如下所示: action datetime user create 2019-01-01 10:00:00 A create 2019-01-05 10:00:00 A confirm 2019-01-07 10:00:00 A create 2019-01-07 10:00:00 A cancel 2019-01-08 10:00:00 A

我有一个包含用户所做操作日志的表,操作类型为create、confirm和cancel,如下所示:

action   datetime              user
create   2019-01-01 10:00:00   A
create   2019-01-05 10:00:00   A
confirm  2019-01-07 10:00:00   A
create   2019-01-07 10:00:00   A
cancel   2019-01-08 10:00:00   A
create   2019-01-09 10:00:00   A
create   2019-01-03 10:00:00   B
cancel   2019-01-08 10:00:00   B
create   2019-01-12 10:00:00   B
所以,我想得到用户在创建每个动作之前所做的动作的类型数量,对于结果之前的数据,如下所示

action   datetime              user  create  confirm  cancel
create   2019-01-01 10:00:00   A     0       0        0
create   2019-01-05 10:00:00   A     1       0        0
create   2019-01-07 10:00:00   A     2       1        0
create   2019-01-09 10:00:00   A     3       1        1
create   2019-01-03 10:00:00   B     0       0        0
create   2019-01-12 10:00:00   B     1       0        1
我一直试图调整的解决方案,但无法得到不同的行动类型计数

SELECT * FROM 
  ( select *, count(1) OVER(PARTITION BY action, user ORDER BY datetime ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) create from `table` ) 
  WHERE action = 'create' ORDER BY datetime LIMIT 20;
有什么想法吗

更新:终于得到了2个查询

问题1:

select *
from (select t.*,  
             countif(action = 'create') over (PARTITION BY user order by datetime rows between unbounded preceding and 1 preceding) as num_create,
             countif(action = 'confirm') over (PARTITION BY user order by datetime rows between unbounded preceding and 1 preceding) as num_confirm,
             countif(action = 'cancel') over (PARTITION BY user order by datetime rows between unbounded preceding and 1 preceding) as num_cancel
      from t
     ) t
where action = 'create' order by datetime;
问题2:

select *
from (select t.*, 
             countif(action = 'create') over (PARTITION BY user order by datetime_diff(datetime(datetime), datetime('2000-01-01'), SECOND) range between unbounded preceding and 1 preceding) as num_create,
             countif(action = 'confirm') over (PARTITION BY user order by datetime_diff(datetime(datetime), datetime('2000-01-01'), SECOND) range between unbounded preceding and 1 preceding) as num_confirm,
             countif(action = 'cancel') over (PARTITION BY user order by datetime_diff(datetime(datetime), datetime('2000-01-01'), SECOND) range between unbounded preceding and 1 preceding) as num_cancel
      from t
     ) t
where action = 'create' order by datetime;
当用户同时有多个操作时,查询1工作得更好。谢谢

更新2:最终查询

#standardSQL
SELECT * FROM (
  SELECT action, datetime, user, 
    COUNTIF(action = 'create') OVER(win) `create`,
    COUNTIF(action = 'confirm') OVER(win) confirm,
    COUNTIF(action = 'cancel') OVER(win) cancel
  FROM `project.dataset.table`
  WINDOW win AS (
    PARTITION BY user 
    ORDER BY datetime, CASE action WHEN 'create' THEN 1 ELSE 0 END 
    ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
  ) 
)
WHERE action = 'create'
ORDER BY user, datetime

您似乎需要累积总和和筛选:

select *
from (select t.*,
             countif(action = 'create') over (order by datetime rows between unbounded preceding and 1 preceding) as num_create,
             countif(action = 'confirm') over (order by datetime rows between unbounded preceding and 1 preceding) as num_confirm,
             countif(action = 'cancel') over (order by datetime rows between unbounded preceding and 1 preceding) as num_cancel
      from t
     ) t
where action = 'create';
实际上,您的时间有重复的值,这可能有点棘手。我建议您将问题更改为包含当前时间范围。但是,如果您确实需要“before”,可以使用
range
。不幸的是,
interval
是不允许的,因此您必须将排序标准转换为一个数字:

select *
from (select t.*,
             countif(action = 'create') over (order by datetime_diff(second, datetime('2000-01-01'), datetime) range between unbounded preceding and 1 preceding) as num_create,
             countif(action = 'confirm') over (order by datetime_diff(second, datetime('2000-01-01'), datetime) range between unbounded preceding and 1 preceding) as num_confirm,
             countif(action = 'cancel') over (order by datetime_diff(second, datetime('2000-01-01'), datetime) range between unbounded preceding and 1 preceding) as num_cancel
      from t
     ) t
where action = 'create';

您似乎需要累积总和和筛选:

select *
from (select t.*,
             countif(action = 'create') over (order by datetime rows between unbounded preceding and 1 preceding) as num_create,
             countif(action = 'confirm') over (order by datetime rows between unbounded preceding and 1 preceding) as num_confirm,
             countif(action = 'cancel') over (order by datetime rows between unbounded preceding and 1 preceding) as num_cancel
      from t
     ) t
where action = 'create';
实际上,您的时间有重复的值,这可能有点棘手。我建议您将问题更改为包含当前时间范围。但是,如果您确实需要“before”,可以使用
range
。不幸的是,
interval
是不允许的,因此您必须将排序标准转换为一个数字:

select *
from (select t.*,
             countif(action = 'create') over (order by datetime_diff(second, datetime('2000-01-01'), datetime) range between unbounded preceding and 1 preceding) as num_create,
             countif(action = 'confirm') over (order by datetime_diff(second, datetime('2000-01-01'), datetime) range between unbounded preceding and 1 preceding) as num_confirm,
             countif(action = 'cancel') over (order by datetime_diff(second, datetime('2000-01-01'), datetime) range between unbounded preceding and 1 preceding) as num_cancel
      from t
     ) t
where action = 'create';

下面是BigQuery标准SQL

#standardSQL
SELECT * FROM (
  SELECT action, datetime, user, 
    COUNTIF(action = 'create') OVER(win) `create`,
    COUNTIF(action = 'confirm') OVER(win) confirm,
    COUNTIF(action = 'cancel') OVER(win) cancel
  FROM `project.dataset.table`
  WINDOW win AS (
    PARTITION BY user 
    ORDER BY datetime, CASE action WHEN 'create' THEN 1 ELSE 0 END 
    ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
  ) 
)
WHERE action = 'create'
ORDER BY user, datetime
如果要应用于您问题中的样本数据-结果为

Row action  datetime                user    create  confirm cancel   
1   create  2019-01-01T10:00:00     A       0       0       0    
2   create  2019-01-05T10:00:00     A       1       0       0    
3   create  2019-01-07T10:00:00     A       2       1       0    
4   create  2019-01-09T10:00:00     A       3       1       1    
5   create  2019-01-03T10:00:00     B       0       0       0    
6   create  2019-01-12T10:00:00     B       1       0       1      

注意:在窗口表达式的order by子句中,当“create”然后使用1 ELSE 0 END时,通过使用
案例操作解决重复日期的“问题”

下面是BigQuery标准SQL

#standardSQL
SELECT * FROM (
  SELECT action, datetime, user, 
    COUNTIF(action = 'create') OVER(win) `create`,
    COUNTIF(action = 'confirm') OVER(win) confirm,
    COUNTIF(action = 'cancel') OVER(win) cancel
  FROM `project.dataset.table`
  WINDOW win AS (
    PARTITION BY user 
    ORDER BY datetime, CASE action WHEN 'create' THEN 1 ELSE 0 END 
    ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
  ) 
)
WHERE action = 'create'
ORDER BY user, datetime
如果要应用于您问题中的样本数据-结果为

Row action  datetime                user    create  confirm cancel   
1   create  2019-01-01T10:00:00     A       0       0       0    
2   create  2019-01-05T10:00:00     A       1       0       0    
3   create  2019-01-07T10:00:00     A       2       1       0    
4   create  2019-01-09T10:00:00     A       3       1       1    
5   create  2019-01-03T10:00:00     B       0       0       0    
6   create  2019-01-12T10:00:00     B       1       0       1      

注意:在窗口表达式的order by子句中,当“create”然后使用1 ELSE 0 END时,通过使用
案例操作解决重复日期的“问题”

谢谢!,我对查询做了一些更改,以便按用户获取操作,但是正如您所说,重复的时间很棘手,有时它不计算操作,因为同时发生了,我可以添加某种职位顺序吗?我查看了行号,但我不知道如何将该值传递给秒,因此它会影响over中的顺序。@AlvaroFlores。第二个查询使用
range
分区解决了这个问题。是的,谢谢,尝试了这两种方法。当操作的日期时间和用户相同时,第一个更好。@AlvaroFlores。米哈伊尔的回答和我的第一个问题是一样的,但发布了很多很多小时之后。我很好奇你为什么接受这个答案。我不认为他们是一样的-他们完全不同!!甚至产生了不同的结果。所以我认为你的答案是错误的,但这取决于OP的判断:o)看起来他是这样做的,现在他重新接受了你的答案。我认为这是相当误导的:o)谢谢!,我对查询做了一些更改,以便按用户获取操作,但是正如您所说,重复的时间很棘手,有时它不计算操作,因为同时发生了,我可以添加某种职位顺序吗?我查看了行号,但我不知道如何将该值传递给秒,因此它会影响over中的顺序。@AlvaroFlores。第二个查询使用
range
分区解决了这个问题。是的,谢谢,尝试了这两种方法。当操作的日期时间和用户相同时,第一个更好。@AlvaroFlores。米哈伊尔的回答和我的第一个问题是一样的,但发布了很多很多小时之后。我很好奇你为什么接受这个答案。我不认为他们是一样的-他们完全不同!!甚至产生了不同的结果。所以我认为你的答案是错误的,但这取决于OP的判断:o)看起来他是这样做的,现在他重新接受了你的答案。我认为这是相当误导的:o)你尝试过这个解决方案吗?看来你错过了!:o) 你试过这个方法吗?看来你错过了!:o)