Snowflake cloud data platform 嵌套窗口函数在雪花中不起作用
我正在将spark sql迁移到snowsql。 曾经有一个场景,我在spark sql中使用了嵌套窗口函数。我想将sql查询迁移到snowflake中。但snowflake不支持嵌套窗口函数 Spark sql查询-Snowflake cloud data platform 嵌套窗口函数在雪花中不起作用,snowflake-cloud-data-platform,snowflake-schema,snowsql,Snowflake Cloud Data Platform,Snowflake Schema,Snowsql,我正在将spark sql迁移到snowsql。 曾经有一个场景,我在spark sql中使用了嵌套窗口函数。我想将sql查询迁移到snowflake中。但snowflake不支持嵌套窗口函数 Spark sql查询- SELECT *, (case when ( ( lead(timestamp -lag(timestamp) over (partition by session_id order by
SELECT
*,
(case when (
(
lead(timestamp -lag(timestamp)
over (partition by session_id order by timestamp))
over (partition by session_id order by timestamp)
) is not null)
then
(
lead(timestamp -lag(timestamp)
over (partition by session_id order by timestamp))
over (partition by session_id order by timestamp)
)
else 0 end)/1000 as pg_to_pg
FROM dwell_time_step2
with lagsession as (
SELECT
a.*,
lag(timestamp) over (partition BY session_id order by timestamp asc) lagsession
FROM mktg_web_wi.dwell_time_step2 a
)
select
a.,
nvl(lead(a.timestamp - b.lagsession) over (partition BY a.session_id order by a.timestamp),0)/1000 pg_to_pg
FROM mktg_web_wi.dwell_time_step2 a,
lagsession b
WHERE a.key=b.key
order by timestamp;
输出-
我已尝试将上述查询转换为雪花,如下所示
转换后的Snowsql-
SELECT
*,
(case when (
(
lead(timestamp -lag(timestamp)
over (partition by session_id order by timestamp))
over (partition by session_id order by timestamp)
) is not null)
then
(
lead(timestamp -lag(timestamp)
over (partition by session_id order by timestamp))
over (partition by session_id order by timestamp)
)
else 0 end)/1000 as pg_to_pg
FROM dwell_time_step2
with lagsession as (
SELECT
a.*,
lag(timestamp) over (partition BY session_id order by timestamp asc) lagsession
FROM mktg_web_wi.dwell_time_step2 a
)
select
a.,
nvl(lead(a.timestamp - b.lagsession) over (partition BY a.session_id order by a.timestamp),0)/1000 pg_to_pg
FROM mktg_web_wi.dwell_time_step2 a,
lagsession b
WHERE a.key=b.key
order by timestamp;
输出-
SELECT
*,
(case when (
(
lead(timestamp -lag(timestamp)
over (partition by session_id order by timestamp))
over (partition by session_id order by timestamp)
) is not null)
then
(
lead(timestamp -lag(timestamp)
over (partition by session_id order by timestamp))
over (partition by session_id order by timestamp)
)
else 0 end)/1000 as pg_to_pg
FROM dwell_time_step2
with lagsession as (
SELECT
a.*,
lag(timestamp) over (partition BY session_id order by timestamp asc) lagsession
FROM mktg_web_wi.dwell_time_step2 a
)
select
a.,
nvl(lead(a.timestamp - b.lagsession) over (partition BY a.session_id order by a.timestamp),0)/1000 pg_to_pg
FROM mktg_web_wi.dwell_time_step2 a,
lagsession b
WHERE a.key=b.key
order by timestamp;
这里,问题在于Snow sql输出。将时间值分配给不同的URL
期望spark sql查询在snowsql上工作,并且在这两种情况下输出应该相同。
如果有人知道如何解决这个问题,请告诉我
谢谢 我认为将其从嵌套窗口函数更改为cte已经改变了滞后和超前记录所指的内容,但这很难让我了解 无论如何,如果我理解这里的代码,我认为有一种更简单的方法,只有一个windows函数
select
a.*,
(nvl(lead(a.timestamp) over (partition BY a.session_id order by a.timestamp) - a.timestamp)/1000,0) pg_to_pg
FROM mktg_web_wi.dwell_time_step2 a
order by timestamp;
我认为,将其从嵌套窗口函数更改为cte已经改变了滞后和超前记录所指的内容,但这很难让我理解 无论如何,如果我理解这里的代码,我认为有一种更简单的方法,只有一个windows函数
select
a.*,
(nvl(lead(a.timestamp) over (partition BY a.session_id order by a.timestamp) - a.timestamp)/1000,0) pg_to_pg
FROM mktg_web_wi.dwell_time_step2 a
order by timestamp;