Sql 窗口函数:仅对另一列中的不同值求和

Sql 窗口函数:仅对另一列中的不同值求和,sql,window-functions,snowflake-cloud-data-platform,Sql,Window Functions,Snowflake Cloud Data Platform,注意:这个问题似乎得到了不错的意见,所以我认为最好更新这个问题,以便更清楚。大多数更改都是表面性的,但唯一主要的更改是我在weights_表中添加了month列。权重表是月度表,所以从技术上讲这并不重要,但我想在两个表中都有月份列会使表的关系更加明显和合乎逻辑 问题 我有一个使用两个表[person\u table]和[weights\u table]的查询 我想要的是改变最后的计算,这样分母sumsumb.weight over被计算为每个月不同人物ID的权重之和的总和,而不是每个电影每个月不

注意:这个问题似乎得到了不错的意见,所以我认为最好更新这个问题,以便更清楚。大多数更改都是表面性的,但唯一主要的更改是我在weights_表中添加了month列。权重表是月度表,所以从技术上讲这并不重要,但我想在两个表中都有月份列会使表的关系更加明显和合乎逻辑

问题

我有一个使用两个表[person\u table]和[weights\u table]的查询

我想要的是改变最后的计算,这样分母sumsumb.weight over被计算为每个月不同人物ID的权重之和的总和,而不是每个电影每个月不同人物ID的权重之和的总和。有没有一种更简单的方法来适应这种情况而不添加另一个子查询

人名表样本

表中的权重示例

预期结果

度量定义:

原始:每个电影每个月的所有不同个人ID的计数

加权:每个电影每月不同人物ID的权重之和

份额:与persons\u表匹配的每个月不同person\u ID的权重之和的加权比率

可能类似于:

select a.month,
    a.movie,
    count(a.person_id) as raw,
    sum(b.weight) as weighted,
    100*weighted/c.ttl_weight as share
from (select distinct month, movie, person_id from person_table) a 
inner join weights_table b on a.person_id=b.person_id
cross join (select sum(weight) as ttl_weight from weights_table w
            where exists (select 1 
                          from person_table p 
                          where w.person_id=p.person_id)
           ) c
group by a.month, a.movie, c.ttl_weight
;

如果这个丑陋的解决方法对任何人都有帮助的话——我所做的是在子查询/CTE中降低权重,以模拟在外部查询中求和唯一权重的效果

select month,
       movie,
       count(distinct person_id) as raw,
       sum(w1) as weighted,
       sum(w1)/1.0/sum(sum(w2)) over() as share
from (select a.*, 
             b.weight/count(*) over (partition by a.month, a.movie, a.person_id) w1, 
             b.weight/count(*) over (partition by a.month, a.person_id) w2
      from person_table a 
      join weights_table b on a.month=b.month and a.person_id=b.person_id) t
group by t.month, t.movie;

我不能说我为这个解决方案感到自豪,因为它只有在频繁查询此类数据时才有用,在这种情况下,将子查询的结果存储在永久的月度表中才有意义。但由于我一个月只使用一到两次,因此我更倾向于使用更高效的查询结构,即使是以冗长为代价。

啊,表中只有一个月的数据,并将子选择分解为CTE,以查看是否可以看到模式。我没有看到任何。。因此,对我来说,这似乎是一个你喜欢SQL的方式

with person_table as (
    select column1 as month, column2 as movie, column3 as person_id, column4 as unique_visit_id
    from values (1, 'a', 1, 1),  
        (1, 'b', 1, 2),
        (1, 'b', 2, 3),
        (1, 'a', 2, 4),
        (1, 'c', 3, 5),
        (1, 'd', 4, 6),
        (1, 'a', 2, 7),
        (1, 'c', 3, 8),
        (1, 'a', 6, 9)
), weight_table as (
    select column1 as person_id, column2 as weight
    from values (1, 12), (2, 34), (3, 65), (4, 76), (999,999)
), dis_month_people as (
    select distinct month, person_id 
    from person_table
), month_share as (
    select month, sum(weight) as total_weight
    from dis_month_people dp
    join weight_table w on dp.person_id = w.person_id
    group by 1
), dis_month_movie_people as (
    select distinct month, movie, person_id
    from person_table
)
select t.* --, weighted, total_weight
    ,t.weighted/m.total_weight as share
from (
  select 
    a.month,
    a.movie,
    count(a.person_id) as raw,
    sum(b.weight) as weighted
  from dis_month_movie_people a 
  join weight_table b on a.person_id = b.person_id
  group by 1,2
) AS t
join month_share m on t.month = m.month 
order by 1,2;

样本数据和期望的结果真的会很有帮助——你们想要计算的指标的定义也是如此。如果我理解的话,让我把它们添加进去。如果一个人一个月看同一部电影不止一次,你想让你的原始和加权只计算一次吗?但是,您希望所有这些都能共享。是吗?@MikeWalton表示原始和加权,这是正确的,但如果同一个人看另一部电影,它算作2。但对于份额的分母来说,他们看什么电影并不重要。我只想对该月的不同个人id的相应权重求和。请向我们展示您对该样本数据的预期结果好吗?此代码实际上不起作用,因为您的月份额总权重是针对所有月份的。选择t.*,加权,总权重,加权/总权重,因为share shows total总是187,这不是您所描述的。在示例数据中,只有一个月,因此它是正确的,但在多个月中,此代码是错误的。如果您将用户行添加到权重,则该行在月度数据中不存在。该行对该行求和。
+-------+-------+-----+----------+-------+
| month | movie | raw | weighted | share |
+-------+-------+-----+----------+-------+
|     1 | a     |   2 |       46 |  0.25 | --(12+34)/(12+34+65+76)=0.25
|     1 | b     |   2 |       46 |  0.25 |
|     1 | c     |   1 |       64 |  0.35 |
|     1 | d     |   1 |       76 |  0.41 |
+-------+-------+-----+----------+-------+
select a.month,
    a.movie,
    count(a.person_id) as raw,
    sum(b.weight) as weighted,
    100*weighted/c.ttl_weight as share
from (select distinct month, movie, person_id from person_table) a 
inner join weights_table b on a.person_id=b.person_id
cross join (select sum(weight) as ttl_weight from weights_table w
            where exists (select 1 
                          from person_table p 
                          where w.person_id=p.person_id)
           ) c
group by a.month, a.movie, c.ttl_weight
;
select month,
       movie,
       count(distinct person_id) as raw,
       sum(w1) as weighted,
       sum(w1)/1.0/sum(sum(w2)) over() as share
from (select a.*, 
             b.weight/count(*) over (partition by a.month, a.movie, a.person_id) w1, 
             b.weight/count(*) over (partition by a.month, a.person_id) w2
      from person_table a 
      join weights_table b on a.month=b.month and a.person_id=b.person_id) t
group by t.month, t.movie;
with person_table as (
    select column1 as month, column2 as movie, column3 as person_id, column4 as unique_visit_id
    from values (1, 'a', 1, 1),  
        (1, 'b', 1, 2),
        (1, 'b', 2, 3),
        (1, 'a', 2, 4),
        (1, 'c', 3, 5),
        (1, 'd', 4, 6),
        (1, 'a', 2, 7),
        (1, 'c', 3, 8),
        (1, 'a', 6, 9)
), weight_table as (
    select column1 as person_id, column2 as weight
    from values (1, 12), (2, 34), (3, 65), (4, 76), (999,999)
), dis_month_people as (
    select distinct month, person_id 
    from person_table
), month_share as (
    select month, sum(weight) as total_weight
    from dis_month_people dp
    join weight_table w on dp.person_id = w.person_id
    group by 1
), dis_month_movie_people as (
    select distinct month, movie, person_id
    from person_table
)
select t.* --, weighted, total_weight
    ,t.weighted/m.total_weight as share
from (
  select 
    a.month,
    a.movie,
    count(a.person_id) as raw,
    sum(b.weight) as weighted
  from dis_month_movie_people a 
  join weight_table b on a.person_id = b.person_id
  group by 1,2
) AS t
join month_share m on t.month = m.month 
order by 1,2;