Sql 选择超过总值百分比的行子集

Sql 选择超过总值百分比的行子集,sql,sql-server,sql-server-2008,tsql,cumulative-sum,Sql,Sql Server,Sql Server 2008,Tsql,Cumulative Sum,我有一个包含客户、用户和收入的表格,类似于下面的数千条记录: Customer User Revenue 001 James 500 002 James 750 003 James 450 004 Sarah 100 005 Sarah 500 006 Sarah 150 007 Sarah 600 008 James 150 009

我有一个包含客户、用户和收入的表格,类似于下面的数千条记录:

Customer   User    Revenue
001        James   500
002        James   750
003        James   450
004        Sarah   100
005        Sarah   500
006        Sarah   150
007        Sarah   600
008        James   150
009        James   100
我想做的是只返回消费最高的客户,这些客户占用户总收入的80%

要手动执行此操作,我将根据James的客户的收入对其进行排序,计算出总收入的百分比和运行总收入的百分比,然后只返回运行总收入达到80%的记录:

Customer    User    Revenue     % of total  Running Total %
002         James   750         0.38        0.38 
001         James   500         0.26        0.64 
003         James   450         0.23        0.87  <- Greater than 80%, last record
008         James   150         0.08        0.95 
009         James   100         0.05        1.00 
我试过使用CTE,但到目前为止都是空白。有没有办法通过单个查询而不是在Excel工作表中手动执行此操作?

SQL Server 2012+仅限

您可以使用窗口总和:

输出:

╔══════════╦═══════╦═════════╦════════════════╦════════════════════╗
║ Customer ║ User  ║ Revenue ║   percentile   ║ running_percentile ║
╠══════════╬═══════╬═════════╬════════════════╬════════════════════╣
║        2 ║ James ║     750 ║ 0,384615384615 ║ 0,384615384615     ║
║        1 ║ James ║     500 ║ 0,256410256410 ║ 0,641025641025     ║
║        7 ║ Sarah ║     600 ║ 0,444444444444 ║ 0,444444444444     ║
╚══════════╩═══════╩═════════╩════════════════╩════════════════════╝
╔══════════╦═══════╦═════════╦════════════════╦════════════════════╗
║ Customer ║ User  ║ Revenue ║   percentile   ║ running_percentile ║
╠══════════╬═══════╬═════════╬════════════════╬════════════════════╣
║        2 ║ James ║     750 ║ 0,384615384615 ║ 0,384615384615     ║
║        1 ║ James ║     500 ║ 0,256410256410 ║ 0,641025641025     ║
║        3 ║ James ║     450 ║ 0,230769230769 ║ 0,871794871794     ║
║        7 ║ Sarah ║     600 ║ 0,444444444444 ║ 0,444444444444     ║
║        5 ║ Sarah ║     500 ║ 0,370370370370 ║ 0,814814814814     ║
╚══════════╩═══════╩═════════╩════════════════╩════════════════════╝
编辑2:


看起来差不多了,唯一的问题是它错过了最后一排, 詹姆斯的第三排得分超过了0.80,但需要包括在内


看起来很完美,翻译到我的大桌子上,返回我所需要的,花了整整5分钟的时间完成,但仍然无法理解你所做的

SQL Server 2008不支持OVER子句中的所有内容,但行号支持

首先,cte仅计算组内的位置:

╔═══════════╦════════╦══════════╦════╗
║ Customer  ║ User   ║ Revenue  ║ rn ║
╠═══════════╬════════╬══════════╬════╣
║        2  ║ James  ║     750  ║  1 ║
║        1  ║ James  ║     500  ║  2 ║
║        3  ║ James  ║     450  ║  3 ║
║        8  ║ James  ║     150  ║  4 ║
║        9  ║ James  ║     100  ║  5 ║
║        7  ║ Sarah  ║     600  ║  1 ║
║        5  ║ Sarah  ║     500  ║  2 ║
║        6  ║ Sarah  ║     150  ║  3 ║
║        4  ║ Sarah  ║     100  ║  4 ║
╚═══════════╩════════╩══════════╩════╝
第二个cte:

c2子查询根据行数的排名计算运行总数 c3计算每个用户的全额金额 在最终查询中,s子查询查找超过80%的最低运行总数

编辑3:

使用行号实际上是多余的

WITH cte AS
(
    SELECT c.Customer, c.[User], c.[Revenue]
           ,percentile         = 1.0 * Revenue / NULLIF(c3.s,0)
           ,running_percentile = 1.0 * c2.s    / NULLIF(c3.s,0)
    FROM t c
    CROSS APPLY
         (SELECT SUM(Revenue) AS s
          FROM t c2
          WHERE c.[User] = c2.[User]
            AND c2.Revenue >= c.Revenue) c2
    CROSS APPLY
         (SELECT SUM(Revenue) AS s
          FROM t c2
          WHERE c.[User] = c2.[User]) AS c3
) 
SELECT a.*
FROM cte a
CROSS APPLY (SELECT MIN(running_percentile) AS rp
             FROM cte c2
             WHERE running_percentile >= 0.8
               AND c2.[User] = a.[User]) AS s
WHERE a.running_percentile <= s.rp
ORDER BY [User], Revenue DESC;

在SQL Server 2012+中,您将使用累积和-效率更高。在SQL Server 2008中,可以使用相关子查询或交叉应用来执行此操作:

注意:*1.0只是为了防止收入存储为整数。SQL Server执行整数除法,几乎所有行上的两列都将返回0

编辑:


如果您只想得到James的结果,请添加where user='James'。

虽然[%of Total]列似乎对单个用户有效,但运行的总计似乎到处都是。@bendataclear。您最初的问题只有一个用户。为单个用户运行总计而调整此值是很简单的。而且比lad的答案要简单得多。第一笔金额约为t。不需要收入。这不起作用,因为没有团体,或者我错过了什么。第二个用户应该被引用为[user],否则会出现错误。第三:SUM OVER计算每个完整not表而不是每个用户的百分比。而且没有任何过滤功能。这当然管用。这是一个对每行使用聚合的应用程序。您可能想查看“应用”上的文档,或者自己尝试。@GordonLinoff请检查。即使删除sum并使用[]添加wrap user,百分比的结果也将是整个表的sumt.Revenue over。情况是,在当前的形式下,代码甚至没有运行。看起来差不多,唯一的问题是它缺少最后一行,James的第三行超过了0.80,但需要包括在内。如果这是不可能的,尽管这不是一场灾难。看起来很完美,翻译到我的大桌子上并返回我需要的,花了整整5分钟的时间来完成它,但仍然无法理解你所做的!非常感谢。
╔══════════╦═══════╦═════════╦════════════════╦════════════════════╗
║ Customer ║ User  ║ Revenue ║   percentile   ║ running_percentile ║
╠══════════╬═══════╬═════════╬════════════════╬════════════════════╣
║        2 ║ James ║     750 ║ 0,384615384615 ║ 0,384615384615     ║
║        1 ║ James ║     500 ║ 0,256410256410 ║ 0,641025641025     ║
║        3 ║ James ║     450 ║ 0,230769230769 ║ 0,871794871794     ║
║        7 ║ Sarah ║     600 ║ 0,444444444444 ║ 0,444444444444     ║
║        5 ║ Sarah ║     500 ║ 0,370370370370 ║ 0,814814814814     ║
╚══════════╩═══════╩═════════╩════════════════╩════════════════════╝
╔═══════════╦════════╦══════════╦════╗
║ Customer  ║ User   ║ Revenue  ║ rn ║
╠═══════════╬════════╬══════════╬════╣
║        2  ║ James  ║     750  ║  1 ║
║        1  ║ James  ║     500  ║  2 ║
║        3  ║ James  ║     450  ║  3 ║
║        8  ║ James  ║     150  ║  4 ║
║        9  ║ James  ║     100  ║  5 ║
║        7  ║ Sarah  ║     600  ║  1 ║
║        5  ║ Sarah  ║     500  ║  2 ║
║        6  ║ Sarah  ║     150  ║  3 ║
║        4  ║ Sarah  ║     100  ║  4 ║
╚═══════════╩════════╩══════════╩════╝
WITH cte AS
(
    SELECT c.Customer, c.[User], c.[Revenue]
           ,percentile         = 1.0 * Revenue / NULLIF(c3.s,0)
           ,running_percentile = 1.0 * c2.s    / NULLIF(c3.s,0)
    FROM t c
    CROSS APPLY
         (SELECT SUM(Revenue) AS s
          FROM t c2
          WHERE c.[User] = c2.[User]
            AND c2.Revenue >= c.Revenue) c2
    CROSS APPLY
         (SELECT SUM(Revenue) AS s
          FROM t c2
          WHERE c.[User] = c2.[User]) AS c3
) 
SELECT a.*
FROM cte a
CROSS APPLY (SELECT MIN(running_percentile) AS rp
             FROM cte c2
             WHERE running_percentile >= 0.8
               AND c2.[User] = a.[User]) AS s
WHERE a.running_percentile <= s.rp
ORDER BY [User], Revenue DESC;
select t.*,
       sum(t.Revenue*1.0) / sum(t.Revenue) over (partition by user) as [% of Total],
       sum(RunningRevenue*1.0) / sum(t.Revenue) over (partition by user) as [Running Total %]
from t cross apply
     (select sum(Revenue) as RunningRevenue
      from t t2
      where t2.Revenue >= t.Revenue and t2.user = t.user
     ) t2;