Sql 获取连续状态的行号并在更改时重置
因此,我希望能够跟踪用户数周内连续登录的次数。我已经尝试了row_number()(按州划分,按周排序),但当状态更改时,row_编号不会重置。下面是一个示例表Sql 获取连续状态的行号并在更改时重置,sql,amazon-redshift,window-functions,Sql,Amazon Redshift,Window Functions,因此,我希望能够跟踪用户数周内连续登录的次数。我已经尝试了row_number()(按州划分,按周排序),但当状态更改时,row_编号不会重置。下面是一个示例表 user_id | week | state --------+--------------+------- 1 | 2018-01-01 | Active 1 | 2018-01-08 | Inactive 1 | 2018-01-15 | Inactive
user_id | week | state
--------+--------------+-------
1 | 2018-01-01 | Active
1 | 2018-01-08 | Inactive
1 | 2018-01-15 | Inactive
1 | 2018-01-22 | Active
1 | 2018-01-29 | Active
2 | 2018-01-01 | Inactive
2 | 2018-01-08 | Active
2 | 2018-01-15 | Inactive
2 | 2018-01-22 | Active
2 | 2018-01-29 | Active
我希望输出能够如下所示:
user_id | week | state | streak
--------+--------------+----------+---------
1000 | 2018-01-01 | Active | 1
1000 | 2018-01-08 | Inactive | 1
1000 | 2018-01-15 | Inactive | 2
1000 | 2018-01-22 | Active | 1
1000 | 2018-01-29 | Active | 2
2000 | 2018-01-01 | Inactive | 1
2000 | 2018-01-08 | Active | 1
2000 | 2018-01-15 | Inactive | 1
2000 | 2018-01-22 | Active | 1
2000 | 2018-01-29 | Active | 2
user_id | week | state | streak
--------+--------------+----------+---------
1000 | 2018-01-01 | Active | 1
1000 | 2018-01-08 | Inactive | 1
1000 | 2018-01-15 | Inactive | 2
1000 | 2018-01-22 | Active | 2
1000 | 2018-01-29 | Active | 3
2000 | 2018-01-01 | Inactive | 1
2000 | 2018-01-08 | Active | 1
2000 | 2018-01-15 | Inactive | 2
2000 | 2018-01-22 | Active | 2
2000 | 2018-01-29 | Active | 3
这是我当前的查询:
SELECT
week,
user_id,
state,
row_number()
OVER(PARTITION BY user_id, state
order by user_id, week) AS streak
FROM
t.data_table
GROUP BY 1,2,3
order by week;
我的输出当前如下所示:
user_id | week | state | streak
--------+--------------+----------+---------
1000 | 2018-01-01 | Active | 1
1000 | 2018-01-08 | Inactive | 1
1000 | 2018-01-15 | Inactive | 2
1000 | 2018-01-22 | Active | 1
1000 | 2018-01-29 | Active | 2
2000 | 2018-01-01 | Inactive | 1
2000 | 2018-01-08 | Active | 1
2000 | 2018-01-15 | Inactive | 1
2000 | 2018-01-22 | Active | 1
2000 | 2018-01-29 | Active | 2
user_id | week | state | streak
--------+--------------+----------+---------
1000 | 2018-01-01 | Active | 1
1000 | 2018-01-08 | Inactive | 1
1000 | 2018-01-15 | Inactive | 2
1000 | 2018-01-22 | Active | 2
1000 | 2018-01-29 | Active | 3
2000 | 2018-01-01 | Inactive | 1
2000 | 2018-01-08 | Active | 1
2000 | 2018-01-15 | Inactive | 2
2000 | 2018-01-22 | Active | 2
2000 | 2018-01-29 | Active | 3
这里的任何建议都会有帮助。这是一个缺口和孤岛问题。该策略是定义具有类似状态的行组,然后使用
row\u number()
枚举它们
一种方法使用不同的行号:
select t.*,
row_number() over (partition by user_id, status, seqnum - seqnum_s order by week) as streak
from (select t.*,
row_number() over (partition by user_id order by week) as seqnum,
row_number() over (partition by user_id, status order by week) as seqnum_s
from t
) t;
解释它是如何工作的有点棘手。如果您查看子查询的结果,您将看到行号的差异如何定义状态相同的每个组。内部查询中的第二个
行号有一个逗号,导致语法错误…如果您同意,我会对其进行编辑,但不是6个字符…@Zack。谢谢。我必须在订单提交人之前的外部查询中添加“状态”。我的最后一个查询是:row\u number()(按用户id划分,seqnum-seqnum\s,状态顺序按..