Sql 红移检查当前记录的值是否存在于下一个分区中
我试图创建一个SQL语句,而不使用表上的自连接来检查当前记录的值是否存在于下一个分区中Sql 红移检查当前记录的值是否存在于下一个分区中,sql,amazon-web-services,amazon-redshift,Sql,Amazon Web Services,Amazon Redshift,我试图创建一个SQL语句,而不使用表上的自连接来检查当前记录的值是否存在于下一个分区中 例如: 输入表格 userid| time | product -----|---------------------|----- 1 | 2020-01-10 8:00:00 | A 1 | 2020-01-10 9:00:00 | B 1 | 2020-01-10 9:00:00 | A 1 | 20
例如:
输入表格
userid| time | product
-----|---------------------|-----
1 | 2020-01-10 8:00:00 | A
1 | 2020-01-10 9:00:00 | B
1 | 2020-01-10 9:00:00 | A
1 | 2020-01-10 10:00:00 | C
1 | 2020-01-10 10:00:00 | B
1 | 2020-01-10 11:00:00 | D
1 | 2020-01-10 11:00:00 | E
1 | 2020-01-10 11:00:00 | A
userid| time | product | Is_Repeated?
-----|---------------------|-----|---------
1 | 2020-01-10 8:00:00 | A | 1
1 | 2020-01-10 9:00:00 | B | 1
1 | 2020-01-10 9:00:00 | A | 0
1 | 2020-01-10 10:00:00 | C | 0
1 | 2020-01-10 10:00:00 | B | 1
1 | 2020-01-10 11:00:00 | D | 0
1 | 2020-01-10 11:00:00 | B | 0
1 | 2020-01-10 11:00:00 | A | 0
输出表
userid| time | product
-----|---------------------|-----
1 | 2020-01-10 8:00:00 | A
1 | 2020-01-10 9:00:00 | B
1 | 2020-01-10 9:00:00 | A
1 | 2020-01-10 10:00:00 | C
1 | 2020-01-10 10:00:00 | B
1 | 2020-01-10 11:00:00 | D
1 | 2020-01-10 11:00:00 | E
1 | 2020-01-10 11:00:00 | A
userid| time | product | Is_Repeated?
-----|---------------------|-----|---------
1 | 2020-01-10 8:00:00 | A | 1
1 | 2020-01-10 9:00:00 | B | 1
1 | 2020-01-10 9:00:00 | A | 0
1 | 2020-01-10 10:00:00 | C | 0
1 | 2020-01-10 10:00:00 | B | 1
1 | 2020-01-10 11:00:00 | D | 0
1 | 2020-01-10 11:00:00 | B | 0
1 | 2020-01-10 11:00:00 | A | 0
下面是我正在尝试的内容,但它查找分区中的下一条记录,而不是下一个分区中的每条记录,并在我的标记中为每条记录返回0
SELECT userid, time, product,
CASE WHEN Lead(product) OVER (partition by userid order by time) = product THEN 1 else 0 END as Is_Repeated?
FROM Input_table
重新执行查询,但按产品和时间排序。对于标志,您需要一次出现,因此如果重复出现,您将得到: 只有在下一步有重复时才进行此检查
with Input_table as
(
select 1 as userid,'2020-01-10 8:00:00' as time, 'A' as product
union select 1 ,'2020-01-10 9:00:00', 'B'
union select 1 ,'2020-01-10 9:00:00', 'A'
union select 1 ,'2020-01-10 10:00:00', 'C'
union select 1 ,'2020-01-10 10:00:00', 'B'
union select 1 ,'2020-01-10 11:00:00', 'D'
union select 1 ,'2020-01-10 11:00:00', 'E'
union select 1 ,'2020-01-10 11:00:00', 'A'
),
cte2 as
(
SELECT userid, "time", product,
CASE WHEN Lead(product) OVER (partition by userid order by product, time) = product THEN 1 else 0 END as "Is_Repeated?"
FROM Input_table
)
select * from cte2 order by time
如果要标记所有重复使用:
with Input_table as
(
select 1 as userid,'2020-01-10 8:00:00' as time, 'A' as product
union select 1 ,'2020-01-10 9:00:00', 'B'
union select 1 ,'2020-01-10 9:00:00', 'A'
union select 1 ,'2020-01-10 10:00:00', 'C'
union select 1 ,'2020-01-10 10:00:00', 'B'
union select 1 ,'2020-01-10 11:00:00', 'D'
union select 1 ,'2020-01-10 11:00:00', 'E'
union select 1 ,'2020-01-10 11:00:00', 'A'
),
cte2 as
(
SELECT userid, "time", product,
CASE WHEN Lead(product) OVER (partition by userid order by product, time) = product THEN 1 else 0 END as "Is_Repeated?"
FROM Input_table
),
cte3 as
(
select userid, product, max("Is_Repeated?") as "Is_Repeated?" from cte2 group by userid, product
)
select a.userid, a.product, "time", "Is_Repeated?"
from Input_table a inner join cte3 b on a.userid = b.userid and a.product = b.product order by "time"
根据您的数据,“下一个分区”似乎意味着大约一小时内。如果是这样,逻辑将是:
SELECT userid, "time", product,
(CASE WHEN LEAD(time) OVER (partition by userid, product ORDER BY time) < time + INTERVAL '2 hour'
THEN 1 ELSE 0
END) as "Is_Repeated?"
FROM Input_table;
我尝试了这两种方法,如果你看一个从8:00到9:00重复的its,它返回isrepeatedflag=1很好,但是9:00中的A不会在10:00中重复,它仍然返回isrepeatedflag=1,而对于9:00时间戳,只有B基本上应该返回isrepeatedflag=1,我想在下一个时间戳中标记任何行项true或1,如果同一个userid存在产品值,那么“在下一个分区中”是什么意思?下一个分区是下一个时间戳如果它不总是小时,是否可能。。这只是一个简单的例子sample@user1701450 . . . 我修正了答案<代码>分区在关系数据库中已经有多种含义。你对这个词的使用令人困惑,但评论澄清了你想做什么。这些信息也应该包括在问题中。