Sql 在对BigQuery执行特定操作之前,计算用户类型的操作数
我有一个包含用户所做操作日志的表,操作类型为create、confirm和cancel,如下所示:Sql 在对BigQuery执行特定操作之前,计算用户类型的操作数,sql,google-bigquery,Sql,Google Bigquery,我有一个包含用户所做操作日志的表,操作类型为create、confirm和cancel,如下所示: action datetime user create 2019-01-01 10:00:00 A create 2019-01-05 10:00:00 A confirm 2019-01-07 10:00:00 A create 2019-01-07 10:00:00 A cancel 2019-01-08 10:00:00 A
action datetime user
create 2019-01-01 10:00:00 A
create 2019-01-05 10:00:00 A
confirm 2019-01-07 10:00:00 A
create 2019-01-07 10:00:00 A
cancel 2019-01-08 10:00:00 A
create 2019-01-09 10:00:00 A
create 2019-01-03 10:00:00 B
cancel 2019-01-08 10:00:00 B
create 2019-01-12 10:00:00 B
所以,我想得到用户在创建每个动作之前所做的动作的类型数量,对于结果之前的数据,如下所示
action datetime user create confirm cancel
create 2019-01-01 10:00:00 A 0 0 0
create 2019-01-05 10:00:00 A 1 0 0
create 2019-01-07 10:00:00 A 2 1 0
create 2019-01-09 10:00:00 A 3 1 1
create 2019-01-03 10:00:00 B 0 0 0
create 2019-01-12 10:00:00 B 1 0 1
我一直试图调整的解决方案,但无法得到不同的行动类型计数
SELECT * FROM
( select *, count(1) OVER(PARTITION BY action, user ORDER BY datetime ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) create from `table` )
WHERE action = 'create' ORDER BY datetime LIMIT 20;
有什么想法吗
更新:终于得到了2个查询
问题1:
select *
from (select t.*,
countif(action = 'create') over (PARTITION BY user order by datetime rows between unbounded preceding and 1 preceding) as num_create,
countif(action = 'confirm') over (PARTITION BY user order by datetime rows between unbounded preceding and 1 preceding) as num_confirm,
countif(action = 'cancel') over (PARTITION BY user order by datetime rows between unbounded preceding and 1 preceding) as num_cancel
from t
) t
where action = 'create' order by datetime;
问题2:
select *
from (select t.*,
countif(action = 'create') over (PARTITION BY user order by datetime_diff(datetime(datetime), datetime('2000-01-01'), SECOND) range between unbounded preceding and 1 preceding) as num_create,
countif(action = 'confirm') over (PARTITION BY user order by datetime_diff(datetime(datetime), datetime('2000-01-01'), SECOND) range between unbounded preceding and 1 preceding) as num_confirm,
countif(action = 'cancel') over (PARTITION BY user order by datetime_diff(datetime(datetime), datetime('2000-01-01'), SECOND) range between unbounded preceding and 1 preceding) as num_cancel
from t
) t
where action = 'create' order by datetime;
当用户同时有多个操作时,查询1工作得更好。谢谢
更新2:最终查询
#standardSQL
SELECT * FROM (
SELECT action, datetime, user,
COUNTIF(action = 'create') OVER(win) `create`,
COUNTIF(action = 'confirm') OVER(win) confirm,
COUNTIF(action = 'cancel') OVER(win) cancel
FROM `project.dataset.table`
WINDOW win AS (
PARTITION BY user
ORDER BY datetime, CASE action WHEN 'create' THEN 1 ELSE 0 END
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
)
)
WHERE action = 'create'
ORDER BY user, datetime
您似乎需要累积总和和筛选:
select *
from (select t.*,
countif(action = 'create') over (order by datetime rows between unbounded preceding and 1 preceding) as num_create,
countif(action = 'confirm') over (order by datetime rows between unbounded preceding and 1 preceding) as num_confirm,
countif(action = 'cancel') over (order by datetime rows between unbounded preceding and 1 preceding) as num_cancel
from t
) t
where action = 'create';
实际上,您的时间有重复的值,这可能有点棘手。我建议您将问题更改为包含当前时间范围。但是,如果您确实需要“before”,可以使用range
。不幸的是,interval
是不允许的,因此您必须将排序标准转换为一个数字:
select *
from (select t.*,
countif(action = 'create') over (order by datetime_diff(second, datetime('2000-01-01'), datetime) range between unbounded preceding and 1 preceding) as num_create,
countif(action = 'confirm') over (order by datetime_diff(second, datetime('2000-01-01'), datetime) range between unbounded preceding and 1 preceding) as num_confirm,
countif(action = 'cancel') over (order by datetime_diff(second, datetime('2000-01-01'), datetime) range between unbounded preceding and 1 preceding) as num_cancel
from t
) t
where action = 'create';
您似乎需要累积总和和筛选:
select *
from (select t.*,
countif(action = 'create') over (order by datetime rows between unbounded preceding and 1 preceding) as num_create,
countif(action = 'confirm') over (order by datetime rows between unbounded preceding and 1 preceding) as num_confirm,
countif(action = 'cancel') over (order by datetime rows between unbounded preceding and 1 preceding) as num_cancel
from t
) t
where action = 'create';
实际上,您的时间有重复的值,这可能有点棘手。我建议您将问题更改为包含当前时间范围。但是,如果您确实需要“before”,可以使用range
。不幸的是,interval
是不允许的,因此您必须将排序标准转换为一个数字:
select *
from (select t.*,
countif(action = 'create') over (order by datetime_diff(second, datetime('2000-01-01'), datetime) range between unbounded preceding and 1 preceding) as num_create,
countif(action = 'confirm') over (order by datetime_diff(second, datetime('2000-01-01'), datetime) range between unbounded preceding and 1 preceding) as num_confirm,
countif(action = 'cancel') over (order by datetime_diff(second, datetime('2000-01-01'), datetime) range between unbounded preceding and 1 preceding) as num_cancel
from t
) t
where action = 'create';
下面是BigQuery标准SQL
#standardSQL
SELECT * FROM (
SELECT action, datetime, user,
COUNTIF(action = 'create') OVER(win) `create`,
COUNTIF(action = 'confirm') OVER(win) confirm,
COUNTIF(action = 'cancel') OVER(win) cancel
FROM `project.dataset.table`
WINDOW win AS (
PARTITION BY user
ORDER BY datetime, CASE action WHEN 'create' THEN 1 ELSE 0 END
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
)
)
WHERE action = 'create'
ORDER BY user, datetime
如果要应用于您问题中的样本数据-结果为
Row action datetime user create confirm cancel
1 create 2019-01-01T10:00:00 A 0 0 0
2 create 2019-01-05T10:00:00 A 1 0 0
3 create 2019-01-07T10:00:00 A 2 1 0
4 create 2019-01-09T10:00:00 A 3 1 1
5 create 2019-01-03T10:00:00 B 0 0 0
6 create 2019-01-12T10:00:00 B 1 0 1
注意:在窗口表达式的order by子句中,当“create”然后使用1 ELSE 0 END时,通过使用
案例操作解决重复日期的“问题” 下面是BigQuery标准SQL
#standardSQL
SELECT * FROM (
SELECT action, datetime, user,
COUNTIF(action = 'create') OVER(win) `create`,
COUNTIF(action = 'confirm') OVER(win) confirm,
COUNTIF(action = 'cancel') OVER(win) cancel
FROM `project.dataset.table`
WINDOW win AS (
PARTITION BY user
ORDER BY datetime, CASE action WHEN 'create' THEN 1 ELSE 0 END
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
)
)
WHERE action = 'create'
ORDER BY user, datetime
如果要应用于您问题中的样本数据-结果为
Row action datetime user create confirm cancel
1 create 2019-01-01T10:00:00 A 0 0 0
2 create 2019-01-05T10:00:00 A 1 0 0
3 create 2019-01-07T10:00:00 A 2 1 0
4 create 2019-01-09T10:00:00 A 3 1 1
5 create 2019-01-03T10:00:00 B 0 0 0
6 create 2019-01-12T10:00:00 B 1 0 1
注意:在窗口表达式的order by子句中,当“create”然后使用1 ELSE 0 END时,通过使用案例操作解决重复日期的“问题” 谢谢!,我对查询做了一些更改,以便按用户获取操作,但是正如您所说,重复的时间很棘手,有时它不计算操作,因为同时发生了,我可以添加某种职位顺序吗?我查看了行号,但我不知道如何将该值传递给秒,因此它会影响over中的顺序。@AlvaroFlores。第二个查询使用range
分区解决了这个问题。是的,谢谢,尝试了这两种方法。当操作的日期时间和用户相同时,第一个更好。@AlvaroFlores。米哈伊尔的回答和我的第一个问题是一样的,但发布了很多很多小时之后。我很好奇你为什么接受这个答案。我不认为他们是一样的-他们完全不同!!甚至产生了不同的结果。所以我认为你的答案是错误的,但这取决于OP的判断:o)看起来他是这样做的,现在他重新接受了你的答案。我认为这是相当误导的:o)谢谢!,我对查询做了一些更改,以便按用户获取操作,但是正如您所说,重复的时间很棘手,有时它不计算操作,因为同时发生了,我可以添加某种职位顺序吗?我查看了行号,但我不知道如何将该值传递给秒,因此它会影响over中的顺序。@AlvaroFlores。第二个查询使用range
分区解决了这个问题。是的,谢谢,尝试了这两种方法。当操作的日期时间和用户相同时,第一个更好。@AlvaroFlores。米哈伊尔的回答和我的第一个问题是一样的,但发布了很多很多小时之后。我很好奇你为什么接受这个答案。我不认为他们是一样的-他们完全不同!!甚至产生了不同的结果。所以我认为你的答案是错误的,但这取决于OP的判断:o)看起来他是这样做的,现在他重新接受了你的答案。我认为这是相当误导的:o)你尝试过这个解决方案吗?看来你错过了!:o) 你试过这个方法吗?看来你错过了!:o)