Snowflake cloud data platform 雪花:窗口功能';范围';不支持,如何查询?
我有一个交易表,包括txn_日期和客户id 对于每个在12月份有交易的客户,我想知道该客户在给定交易之前的90天内有多少交易 这似乎是一个可以使用窗口函数和范围滑动窗口运行的查询,但Snowflake不支持范围滑动窗口框架Snowflake cloud data platform 雪花:窗口功能';范围';不支持,如何查询?,snowflake-cloud-data-platform,Snowflake Cloud Data Platform,我有一个交易表,包括txn_日期和客户id 对于每个在12月份有交易的客户,我想知道该客户在给定交易之前的90天内有多少交易 这似乎是一个可以使用窗口函数和范围滑动窗口运行的查询,但Snowflake不支持范围滑动窗口框架 如何在Snowflake中运行此查询?类似这样的内容如何: WITH T1 AS ( SELECT CUSTOMER_ID, TX_DATE FROM TRANSACTIONS WHERE TX_DATE BETWEEN '2020-12-01' AN
如何在Snowflake中运行此查询?类似这样的内容如何:
WITH T1 AS (
SELECT CUSTOMER_ID, TX_DATE
FROM TRANSACTIONS
WHERE TX_DATE BETWEEN '2020-12-01' AND '2020-12-31')
SELECT T2.CUSTOMER_ID, T2.TX_DATE
FROM TRANSACTIONS T2
INNER JOIN T1 ON T2.CUSTOMER_ID = T2.CUSTOMER_ID
WHERE T2.TX_DATE BETWEEN (T1.TX_DATE - 90) AND T1.TX_DATE
一开始,NickW的答案也是如此
WITH data AS (
SELECT txn_date::timestamp_ntz as txn_date, cust_id, txn_id
FROM VALUES
('2020-12-04',0, 0),
('2020-12-03',1, 1),
('2020-11-04',1, 2),
('2020-10-04',1, 3),
('2020-09-04',1, 4), -- just on 90 days
('2020-09-02',1, 5), -- too far
('2021-01-05',1, 6) -- in the future
v(txn_date , cust_id, txn_id)
), dec_txn AS (
SELECT txn_id,
cust_id,
DATEADD('day',-90, txn_date) AS win_start,
txn_date AS win_end
FROM data
WHERE date_trunc('month', txn_date) = '2020-12-01'
)
SELECT dt.*
,t.*
,datediff('days', dt.win_end, t.txn_date) as win_time
FROM dec_txn AS dt
LEFT JOIN data AS t
ON t.cust_id = dt.cust_id
AND t.txn_date between dt.win_start and win_end AND t.txn_id != dt.txn_id
;
其中:
TXN_ID CUST_ID WIN_START WIN_END TXN_DATE CUST_ID TXN_ID WIN_TIME
1 1 2020-09-04 00:00:00.000 2020-12-03 00:00:00.000 2020-11-04 00:00:00.000 1 2 -29
1 1 2020-09-04 00:00:00.000 2020-12-03 00:00:00.000 2020-10-04 00:00:00.000 1 3 -60
1 1 2020-09-04 00:00:00.000 2020-12-03 00:00:00.000 2020-09-04 00:00:00.000 1 4 -90
0 0 2020-09-05 00:00:00.000 2020-12-04 00:00:00.000 NULL NULL NULL NULL
因此,我们:
WITH data AS (
SELECT txn_date::timestamp_ntz as txn_date, cust_id, txn_id
FROM VALUES
('2020-12-04',0, 0),
('2020-12-03',1, 1),
('2020-11-04',1, 2),
('2020-10-04',1, 3),
('2020-09-04',1, 4), -- just on 90 days
('2020-09-02',1, 5), -- too far
('2021-01-05',1, 6) -- in the future
v(txn_date , cust_id, txn_id)
), dec_txn AS (
SELECT txn_id,
cust_id,
txn_date,
DATEADD('day',-90, txn_date) AS win_start,
txn_date AS win_end
FROM data
WHERE date_trunc('month', txn_date) = '2020-12-01'
)
SELECT dt.cust_id
,dt.txn_id
,dt.txn_date
,count(t.txn_id) as c__prior_90_days_transaction
FROM dec_txn AS dt
LEFT JOIN data AS t
ON t.cust_id = dt.cust_id
AND t.txn_date >= dt.win_start and t.txn_date < dt.win_end AND t.txn_id != dt.txn_id
GROUP BY 1,2,3
ORDER BY 1,2
;
问题中没有明确定义的是,如果12月份有多个客户的请求,该怎么办
如果在同一个12月日有多笔交易,该怎么办
上面将为每个客户的每个Dec事务返回一行,其中包括当天发生的事务。但是如果您的日期/时间戳有时间,那么它将只计算同一天早些时候的转换次数。
但是如果你想要前几天,而txn_日期只是一个日期,那么
AND t.txn_date >= dt.win_start and t.txn_date < dt.win_end AND t.txn_id != dt.txn_id
现在,窗口时间戳被截断为天,那么如果您希望午夜事务计算当天的时间,或者如果您没有午夜时间戳,则必须进行训练
AND t.txn_date >= dt.win_start and t.txn_date < dt.win_end AND t.txn_id != dt.txn_id
dec_txn AS (
SELECT txn_id,
cust_id,
DATEADD('day',-90, txn_date::date) AS win_start,
txn_date::date AS win_end
FROM data
WHERE date_trunc('month', txn_date) = '2020-12-01'