Sql 选择超过总值百分比的行子集
我有一个包含客户、用户和收入的表格,类似于下面的数千条记录:Sql 选择超过总值百分比的行子集,sql,sql-server,sql-server-2008,tsql,cumulative-sum,Sql,Sql Server,Sql Server 2008,Tsql,Cumulative Sum,我有一个包含客户、用户和收入的表格,类似于下面的数千条记录: Customer User Revenue 001 James 500 002 James 750 003 James 450 004 Sarah 100 005 Sarah 500 006 Sarah 150 007 Sarah 600 008 James 150 009
Customer User Revenue
001 James 500
002 James 750
003 James 450
004 Sarah 100
005 Sarah 500
006 Sarah 150
007 Sarah 600
008 James 150
009 James 100
我想做的是只返回消费最高的客户,这些客户占用户总收入的80%
要手动执行此操作,我将根据James的客户的收入对其进行排序,计算出总收入的百分比和运行总收入的百分比,然后只返回运行总收入达到80%的记录:
Customer User Revenue % of total Running Total %
002 James 750 0.38 0.38
001 James 500 0.26 0.64
003 James 450 0.23 0.87 <- Greater than 80%, last record
008 James 150 0.08 0.95
009 James 100 0.05 1.00
我试过使用CTE,但到目前为止都是空白。有没有办法通过单个查询而不是在Excel工作表中手动执行此操作?SQL Server 2012+仅限
您可以使用窗口总和:
输出:
╔══════════╦═══════╦═════════╦════════════════╦════════════════════╗
║ Customer ║ User ║ Revenue ║ percentile ║ running_percentile ║
╠══════════╬═══════╬═════════╬════════════════╬════════════════════╣
║ 2 ║ James ║ 750 ║ 0,384615384615 ║ 0,384615384615 ║
║ 1 ║ James ║ 500 ║ 0,256410256410 ║ 0,641025641025 ║
║ 7 ║ Sarah ║ 600 ║ 0,444444444444 ║ 0,444444444444 ║
╚══════════╩═══════╩═════════╩════════════════╩════════════════════╝
╔══════════╦═══════╦═════════╦════════════════╦════════════════════╗
║ Customer ║ User ║ Revenue ║ percentile ║ running_percentile ║
╠══════════╬═══════╬═════════╬════════════════╬════════════════════╣
║ 2 ║ James ║ 750 ║ 0,384615384615 ║ 0,384615384615 ║
║ 1 ║ James ║ 500 ║ 0,256410256410 ║ 0,641025641025 ║
║ 3 ║ James ║ 450 ║ 0,230769230769 ║ 0,871794871794 ║
║ 7 ║ Sarah ║ 600 ║ 0,444444444444 ║ 0,444444444444 ║
║ 5 ║ Sarah ║ 500 ║ 0,370370370370 ║ 0,814814814814 ║
╚══════════╩═══════╩═════════╩════════════════╩════════════════════╝
编辑2:
看起来差不多了,唯一的问题是它错过了最后一排, 詹姆斯的第三排得分超过了0.80,但需要包括在内
看起来很完美,翻译到我的大桌子上,返回我所需要的,花了整整5分钟的时间完成,但仍然无法理解你所做的 SQL Server 2008不支持OVER子句中的所有内容,但行号支持 首先,cte仅计算组内的位置:
╔═══════════╦════════╦══════════╦════╗
║ Customer ║ User ║ Revenue ║ rn ║
╠═══════════╬════════╬══════════╬════╣
║ 2 ║ James ║ 750 ║ 1 ║
║ 1 ║ James ║ 500 ║ 2 ║
║ 3 ║ James ║ 450 ║ 3 ║
║ 8 ║ James ║ 150 ║ 4 ║
║ 9 ║ James ║ 100 ║ 5 ║
║ 7 ║ Sarah ║ 600 ║ 1 ║
║ 5 ║ Sarah ║ 500 ║ 2 ║
║ 6 ║ Sarah ║ 150 ║ 3 ║
║ 4 ║ Sarah ║ 100 ║ 4 ║
╚═══════════╩════════╩══════════╩════╝
第二个cte:
c2子查询根据行数的排名计算运行总数
c3计算每个用户的全额金额
在最终查询中,s子查询查找超过80%的最低运行总数
编辑3:
使用行号实际上是多余的
WITH cte AS
(
SELECT c.Customer, c.[User], c.[Revenue]
,percentile = 1.0 * Revenue / NULLIF(c3.s,0)
,running_percentile = 1.0 * c2.s / NULLIF(c3.s,0)
FROM t c
CROSS APPLY
(SELECT SUM(Revenue) AS s
FROM t c2
WHERE c.[User] = c2.[User]
AND c2.Revenue >= c.Revenue) c2
CROSS APPLY
(SELECT SUM(Revenue) AS s
FROM t c2
WHERE c.[User] = c2.[User]) AS c3
)
SELECT a.*
FROM cte a
CROSS APPLY (SELECT MIN(running_percentile) AS rp
FROM cte c2
WHERE running_percentile >= 0.8
AND c2.[User] = a.[User]) AS s
WHERE a.running_percentile <= s.rp
ORDER BY [User], Revenue DESC;
在SQL Server 2012+中,您将使用累积和-效率更高。在SQL Server 2008中,可以使用相关子查询或交叉应用来执行此操作: 注意:*1.0只是为了防止收入存储为整数。SQL Server执行整数除法,几乎所有行上的两列都将返回0 编辑:
如果您只想得到James的结果,请添加where user='James'。虽然[%of Total]列似乎对单个用户有效,但运行的总计似乎到处都是。@bendataclear。您最初的问题只有一个用户。为单个用户运行总计而调整此值是很简单的。而且比lad的答案要简单得多。第一笔金额约为t。不需要收入。这不起作用,因为没有团体,或者我错过了什么。第二个用户应该被引用为[user],否则会出现错误。第三:SUM OVER计算每个完整not表而不是每个用户的百分比。而且没有任何过滤功能。这当然管用。这是一个对每行使用聚合的应用程序。您可能想查看“应用”上的文档,或者自己尝试。@GordonLinoff请检查。即使删除sum并使用[]添加wrap user,百分比的结果也将是整个表的sumt.Revenue over。情况是,在当前的形式下,代码甚至没有运行。看起来差不多,唯一的问题是它缺少最后一行,James的第三行超过了0.80,但需要包括在内。如果这是不可能的,尽管这不是一场灾难。看起来很完美,翻译到我的大桌子上并返回我需要的,花了整整5分钟的时间来完成它,但仍然无法理解你所做的!非常感谢。
╔══════════╦═══════╦═════════╦════════════════╦════════════════════╗
║ Customer ║ User ║ Revenue ║ percentile ║ running_percentile ║
╠══════════╬═══════╬═════════╬════════════════╬════════════════════╣
║ 2 ║ James ║ 750 ║ 0,384615384615 ║ 0,384615384615 ║
║ 1 ║ James ║ 500 ║ 0,256410256410 ║ 0,641025641025 ║
║ 3 ║ James ║ 450 ║ 0,230769230769 ║ 0,871794871794 ║
║ 7 ║ Sarah ║ 600 ║ 0,444444444444 ║ 0,444444444444 ║
║ 5 ║ Sarah ║ 500 ║ 0,370370370370 ║ 0,814814814814 ║
╚══════════╩═══════╩═════════╩════════════════╩════════════════════╝
╔═══════════╦════════╦══════════╦════╗
║ Customer ║ User ║ Revenue ║ rn ║
╠═══════════╬════════╬══════════╬════╣
║ 2 ║ James ║ 750 ║ 1 ║
║ 1 ║ James ║ 500 ║ 2 ║
║ 3 ║ James ║ 450 ║ 3 ║
║ 8 ║ James ║ 150 ║ 4 ║
║ 9 ║ James ║ 100 ║ 5 ║
║ 7 ║ Sarah ║ 600 ║ 1 ║
║ 5 ║ Sarah ║ 500 ║ 2 ║
║ 6 ║ Sarah ║ 150 ║ 3 ║
║ 4 ║ Sarah ║ 100 ║ 4 ║
╚═══════════╩════════╩══════════╩════╝
WITH cte AS
(
SELECT c.Customer, c.[User], c.[Revenue]
,percentile = 1.0 * Revenue / NULLIF(c3.s,0)
,running_percentile = 1.0 * c2.s / NULLIF(c3.s,0)
FROM t c
CROSS APPLY
(SELECT SUM(Revenue) AS s
FROM t c2
WHERE c.[User] = c2.[User]
AND c2.Revenue >= c.Revenue) c2
CROSS APPLY
(SELECT SUM(Revenue) AS s
FROM t c2
WHERE c.[User] = c2.[User]) AS c3
)
SELECT a.*
FROM cte a
CROSS APPLY (SELECT MIN(running_percentile) AS rp
FROM cte c2
WHERE running_percentile >= 0.8
AND c2.[User] = a.[User]) AS s
WHERE a.running_percentile <= s.rp
ORDER BY [User], Revenue DESC;
select t.*,
sum(t.Revenue*1.0) / sum(t.Revenue) over (partition by user) as [% of Total],
sum(RunningRevenue*1.0) / sum(t.Revenue) over (partition by user) as [Running Total %]
from t cross apply
(select sum(Revenue) as RunningRevenue
from t t2
where t2.Revenue >= t.Revenue and t2.user = t.user
) t2;