Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/sql/80.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在Snowflake sql中,如何使用partition by和order by计算不同的值?_Sql_Data Science_Snowflake Cloud Data Platform_Data Analysis_Data Partitioning - Fatal编程技术网

在Snowflake sql中,如何使用partition by和order by计算不同的值?

在Snowflake sql中,如何使用partition by和order by计算不同的值?,sql,data-science,snowflake-cloud-data-platform,data-analysis,data-partitioning,Sql,Data Science,Snowflake Cloud Data Platform,Data Analysis,Data Partitioning,我的数据如下: | user | eventorder| postal| |:---- |:---------:| -----:| | A | 1 | 60616 | | A | 2 | 10000 | | A | 3 | 60616 | | B | 1 | 20000 | | B | 2 | 30000 | | B | 3 | 40000 | | B | 4

我的数据如下:

| user | eventorder| postal|
|:---- |:---------:| -----:|
| A    | 1         | 60616 |
| A    | 2         | 10000 |
| A    | 3         | 60616 |
| B    | 1         | 20000 |
| B    | 2         | 30000 |
| B    | 3         | 40000 |
| B    | 4         | 30000 |
| B    | 5         | 20000 |
| user | eventorder| postal| travelledStop|
|:---- |:---------:| -----:| ------------:|
| A    | 1         | 60616 |  1    |
| A    | 2         | 10000 |  2    |
| A    | 3         | 60616 |  2    |
| B    | 1         | 20000 |  1    |
| B    | 2         | 30000 |  2    |
| B    | 3         | 40000 |  3    |
| B    | 4         | 30000 |  3    |
| B    | 5         | 20000 |  3    |
我需要解决的问题是:在用户旅行的每个事件顺序之前,有多少不同的站点?

理想结果应如下所示:

| user | eventorder| postal|
|:---- |:---------:| -----:|
| A    | 1         | 60616 |
| A    | 2         | 10000 |
| A    | 3         | 60616 |
| B    | 1         | 20000 |
| B    | 2         | 30000 |
| B    | 3         | 40000 |
| B    | 4         | 30000 |
| B    | 5         | 20000 |
| user | eventorder| postal| travelledStop|
|:---- |:---------:| -----:| ------------:|
| A    | 1         | 60616 |  1    |
| A    | 2         | 10000 |  2    |
| A    | 3         | 60616 |  2    |
| B    | 1         | 20000 |  1    |
| B    | 2         | 30000 |  2    |
| B    | 3         | 40000 |  3    |
| B    | 4         | 30000 |  3    |
| B    | 5         | 20000 |  3    |
以A为例,当事件顺序为1时,它仅行驶60616-1站。 当事件顺序为2时,它已行驶60616次,停10000-2次。 当事件顺序为3时,该用户已行驶的不同站点为60616和10000。-2站

我不允许使用count distinct和按顺序分区。我想在(按用户顺序按事件顺序划分)上做一些类似于count(distinct(postal))的事情,但这是不允许的


有人知道如何解决这个问题吗?非常感谢

我使用了您提供的样本数据(只是一个样本的子集,但这应该可以扩展)。这里的目标基本上是为每一行生成一个数组,该数组累积了以前事件的所有postals

with _temp as (
select 'A' as usr, 1 as EventOrder, '60616' as Postal
UNION ALL
select 'A' as usr, 2 as EventOrder, '10000' as Postal
UNION ALL
select 'A' as usr, 3 as EventOrder, '60616' as Postal
),
_intermediate as (
select usr
    , eventorder
    , postal
    , array_slice(
          array_agg(postal)
            within group (order by eventorder)
            OVER (Partition by usr)
           , 0, eventorder) as full_array
from _temp
group by usr, eventorder, postal
)
select usr, eventorder, postal, count(distinct f.value)
from _intermediate i, lateral flatten(input => i.full_array) f
group by usr, eventorder, postal

也许最简单的方法是使用子查询并计算“1”:


我喜欢@Daniel Zagales的答案,但这里有一个解决方法,使用
densite_-rank
sum

with temp as (
select 'A' as usr, 1 as EventOrder, '60616' as Postal
UNION ALL
select 'A' as usr, 2 as EventOrder, '10000' as Postal
UNION ALL
select 'A' as usr, 3 as EventOrder, '60616' as Postal  
UNION ALL
select 'B' as usr, 1 as EventOrder, '20000' as Postal  
UNION ALL
select 'B' as usr, 2 as EventOrder, '30000' as Postal  
UNION ALL
select 'B' as usr, 3 as EventOrder, '40000' as Postal 
UNION ALL
select 'B' as usr, 4 as EventOrder, '30000' as Postal  
UNION ALL
select 'B' as usr, 5 as EventOrder, '20000' as Postal 
),
temp2 as(
select temp.* ,dense_rank()over(partition by usr,Postal order by EventOrder) rks
from temp 
)
select usr,eventorder,postal,sum(case when rks = 1 then 1 else 0 END)over(partition by usr order by EventOrder) travelledStop
from temp2 
order by usr,EventOrder 
基本上使用
density_-rank
得到第一个出现的停止,而不是总结


非常好的解决方案!我也试着做同样的事情,但不知道如何为每一行构建数组(考虑的是窗口框架,但不受支持),但是array_slice()是一个很好的方法。