SQL OVER()子句-何时以及为什么它有用?
我读到了那个条款,我不明白为什么我需要它。 该函数的作用是什么?SQL OVER()子句-何时以及为什么它有用?,sql,sql-server,aggregate-functions,clause,Sql,Sql Server,Aggregate Functions,Clause,我读到了那个条款,我不明白为什么我需要它。 该函数的作用是什么?按分区做什么? 为什么我不能通过编写按SalesOrderID分组进行查询?子句的强大之处在于,无论您是否使用按分组,您都可以在不同的范围内进行聚合(“窗口化”) 示例:获取每个SalesOrderID的计数和所有的计数 USE AdventureWorks2008R2; GO SELECT SalesOrderID, ProductID, OrderQty ,SUM(OrderQty) OVER(PARTITION
按
分区做什么?
为什么我不能通过编写
按SalesOrderID分组进行查询?子句的强大之处在于,无论您是否使用按
分组,您都可以在不同的范围内进行聚合(“窗口化”)
示例:获取每个SalesOrderID的计数和所有的计数
USE AdventureWorks2008R2;
GO
SELECT SalesOrderID, ProductID, OrderQty
,SUM(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Total'
,AVG(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Avg'
,COUNT(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Count'
,MIN(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Min'
,MAX(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Max'
FROM Sales.SalesOrderDetail
WHERE SalesOrderID IN(43659,43664);
获取不同的计数
s,无分组依据
SELECT
SalesOrderID, ProductID, OrderQty
,COUNT(OrderQty) AS 'Count'
,COUNT(*) OVER () AS 'CountAll'
FROM Sales.SalesOrderDetail
WHERE
SalesOrderID IN(43659,43664)
GROUP BY
SalesOrderID, ProductID, OrderQty
OVER子句在与PARTITION BY state组合时表示,前面的函数调用必须通过计算查询返回的行来分析完成。将其视为一个内联GROUPBY语句
OVER(按SalesOrderID划分)
说明了对于SUM、AVG等。。。函数,返回查询返回的记录子集上的值,并按外键SalesOrderID对该子集进行分区
因此,我们将对每个唯一SalesOrderID的每个OrderQty记录求和,该列名将被称为“Total”
这是一种比使用多个内联视图查找相同信息更有效的方法。您可以将此查询放在内联视图中,然后根据Total进行筛选
SELECT
SalesOrderID, ProductID, OrderQty
,COUNT(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'CountQtyPerOrder'
,COUNT(OrderQty) OVER(PARTITION BY ProductID) AS 'CountQtyPerProduct',
,COUNT(*) OVER () AS 'CountAllAgain'
FROM Sales.SalesOrderDetail
WHERE
SalesOrderID IN(43659,43664)
选择。。。,
从(您的查询)inlineview
其中总计<200
如果只想按SalesOrderID进行分组,则无法在SELECT子句中包含ProductID和OrderQty列
PARTITION BY子句允许您分解聚合函数。一个明显且有用的示例是,如果您希望为订单上的订单行生成行号:
SELECT ...,
FROM (your query) inlineview
WHERE Total < 200
(我的语法可能有点错误)
然后你会得到类似的结果:
SELECT
O.order_id,
O.order_date,
ROW_NUMBER() OVER(PARTITION BY O.order_id) AS line_item_no,
OL.product_id
FROM
Orders O
INNER JOIN Order_Lines OL ON OL.order_id = O.order_id
您可以使用按SalesOrderID分组
。不同之处在于,对于GROUP BY,您只能拥有GROUP BY中未包含的列的聚合值
相反,使用窗口聚合函数而不是GROUP BY,可以检索聚合值和非聚合值。也就是说,尽管您在示例查询中没有这样做,但您可以在相同的SalesOrderID
s组上检索单个OrderQty
值及其总和、计数、平均值等
下面是一个实际的例子,说明了为什么窗口聚合非常好。假设您需要计算每个值占总值的百分比。如果没有窗口聚合,您必须首先导出聚合值列表,然后将其连接回原始行集,即如下所示:
order_id order_date line_item_no product_id
-------- ---------- ------------ ----------
1 2011-05-02 1 5
1 2011-05-02 2 4
1 2011-05-02 3 7
2 2011-05-12 1 8
2 2011-05-12 2 1
现在看看如何对窗口聚合执行相同操作:
SELECT
orig.[Partition],
orig.Value,
orig.Value * 100.0 / agg.TotalValue AS ValuePercent
FROM OriginalRowset orig
INNER JOIN (
SELECT
[Partition],
SUM(Value) AS TotalValue
FROM OriginalRowset
GROUP BY [Partition]
) agg ON orig.[Partition] = agg.[Partition]
更简单更干净,不是吗?让我用一个例子解释一下,你就能看到它是如何工作的
假设您有下表DIM_设备:
SELECT
[Partition],
Value,
Value * 100.0 / SUM(Value) OVER (PARTITION BY [Partition]) AS ValuePercent
FROM OriginalRowset orig
在SQL下面运行
VIN MAKE MODEL YEAR COLOR
-----------------------------------------
1234ASDF Ford Taurus 2008 White
1234JKLM Chevy Truck 2005 Green
5678ASDF Ford Mustang 2008 Yellow
结果如下
SELECT VIN,
MAKE,
MODEL,
YEAR,
COLOR ,
COUNT(*) OVER (PARTITION BY YEAR) AS COUNT2
FROM DIM_EQUIPMENT
看看发生了什么
您可以不分组按年计数,并与行匹配
另一种获得相同结果的有趣方法是使用WITH子句,WITH作为内嵌视图工作,可以简化查询,特别是复杂的查询,但这里不是这样,因为我只是想展示用法
VIN MAKE MODEL YEAR COLOR COUNT2
----------------------------------------------
1234JKLM Chevy Truck 2005 Green 1
5678ASDF Ford Mustang 2008 Yellow 2
1234ASDF Ford Taurus 2008 White 2
- 也称为
查询请求
子句
- 类似于
分组依据
子句
- 将数据分成块(或分区)
- 按分区界限分开
- 函数在分区内执行
- 跨越分界线时重新初始化
语法:
函数(…)在(col1 col3,…)上的划分
- 功能
- 熟悉的函数,如
COUNT()
,SUM()
,MIN()
,MAX()
,等等
- 新函数(例如
行数()
,对行数()的比率()
,等等)
有关示例的更多信息:
这是查询的结果。用作源的表是同一个exept,它没有最后一列。此列是第三列的移动和
查询:
prkey whatsthat cash
890 "abb " 32 32
43 "abbz " 2 34
4 "bttu " 1 35
45 "gasstuff " 2 37
545 "gasz " 5 42
80009 "hoo " 9 51
2321 "ibm " 1 52
998 "krk " 2 54
42 "kx-5010 " 2 56
32 "lto " 4 60
543 "mp " 5 65
465 "multipower " 2 67
455 "O.N. " 1 68
7887 "prem " 7 75
434 "puma " 3 78
23 "retractble " 3 81
242 "Trujillo's stuff " 4 85
(表为public.iuk)
这比dbase(1986)的水平高了一点,我不知道为什么需要25年以上的时间来完成它 简单地说:
Over子句可用于选择非聚合值和聚合值
按划分、内部按排序,以及行或范围是OVER()BY子句的一部分
partition by用于对数据进行分区,然后执行这些窗口聚合函数,如果没有partition by,则整个结果集被视为单个分区
Select *,sum(salary) Over(order by salary ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as sum_sal from employees
Id Name Gender Salary sum_sal
----------- -------------------------------------------------- ---------- ----------- -----------
1 Mark Male 5000 62000
2 John Male 4500 62000
3 Pavan Male 5000 62000
4 Pam Female 5500 62000
5 Sara Female 4000 62000
6 Aradhya Female 3500 62000
7 Tom Male 5500 62000
8 Mary Female 5000 62000
9 Ben Male 6500 62000
10 Jodi Female 7000 62000
11 Tom Male 5500 62000
12 Ron Male 5000 62000
OVER子句可以与排序函数(秩、行数、密集秩..)、聚合函数(平均值、最大值、最小值、和…等)和分析函数(第一个值、最后一个值和其他几个)一起使用
让我们看看OVER子句的基本语法
sql version: 2012
所以,让我执行不同的场景,看看数据是如何受到影响的,我将从困难的语法变成简单的语法
Id Name Gender Salary
----------- -------------------------------------------------- ---------- -----------
1 Mark Male 5000
2 John Male 4500
3 Pavan Male 5000
4 Pam Female 5500
5 Sara Female 4000
6 Aradhya Female 3500
7 Tom Male 5500
8 Mary Female 5000
9 Ben Male 6500
10 Jodi Female 7000
11 Tom Male 5500
12 Ron Male 5000
只需观察总结部分。这里我使用的是按工资排序,并使用“前一行和当前行之间的无限范围”。
在这种情况下,我们不使用分区,所以整个数据将被视为一个分区,并且我们根据薪水进行排序。
这里重要的是无界的前一行和当前行。这意味着当我们计算总和时,从每行的起始行到当前行。
但如果我们看到的行中有salary 5000和name=“Pavan”,理想情况下应该是17000,对于salary=5000和name=Mark,应该是22000。但是,当我们使用范围时,在本例中,如果它发现任何类似的元素,那么它将它们视为同一逻辑组,并对它们执行操作,并为该组中的每个项目赋值。这就是为什么我们的薪水=50的原因
OVER (
[ <PARTITION BY clause> ]
[ <ORDER BY clause> ]
[ <ROW or RANGE clause> ]
)
Id Name Gender Salary
----------- -------------------------------------------------- ---------- -----------
1 Mark Male 5000
2 John Male 4500
3 Pavan Male 5000
4 Pam Female 5500
5 Sara Female 4000
6 Aradhya Female 3500
7 Tom Male 5500
8 Mary Female 5000
9 Ben Male 6500
10 Jodi Female 7000
11 Tom Male 5500
12 Ron Male 5000
Select *,SUM(salary) Over(order by salary RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as sum_sal from employees
Id Name Gender Salary sum_sal
----------- -------------------------------------------------- ---------- ----------- -----------
6 Aradhya Female 3500 3500
5 Sara Female 4000 7500
2 John Male 4500 12000
3 Pavan Male 5000 32000
1 Mark Male 5000 32000
8 Mary Female 5000 32000
12 Ron Male 5000 32000
11 Tom Male 5500 48500
7 Tom Male 5500 48500
4 Pam Female 5500 48500
9 Ben Male 6500 55000
10 Jodi Female 7000 62000
Select *,SUM(salary) Over(order by salary ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as sum_sal from employees
Id Name Gender Salary sum_sal
----------- -------------------------------------------------- ---------- ----------- -----------
6 Aradhya Female 3500 3500
5 Sara Female 4000 7500
2 John Male 4500 12000
3 Pavan Male 5000 17000
1 Mark Male 5000 22000
8 Mary Female 5000 27000
12 Ron Male 5000 32000
11 Tom Male 5500 37500
7 Tom Male 5500 43000
4 Pam Female 5500 48500
9 Ben Male 6500 55000
10 Jodi Female 7000 62000
Select *,SUM(salary) Over(order by salary) as sum_sal from employees
Id Name Gender Salary sum_sal
----------- -------------------------------------------------- ---------- ----------- -----------
6 Aradhya Female 3500 3500
5 Sara Female 4000 7500
2 John Male 4500 12000
3 Pavan Male 5000 32000
1 Mark Male 5000 32000
8 Mary Female 5000 32000
12 Ron Male 5000 32000
11 Tom Male 5500 48500
7 Tom Male 5500 48500
4 Pam Female 5500 48500
9 Ben Male 6500 55000
10 Jodi Female 7000 62000
Select *, SUM(salary) Over(order by salary RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as sum_sal from employees
Select *,sum(salary) Over(order by salary ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as sum_sal from employees
Id Name Gender Salary sum_sal
----------- -------------------------------------------------- ---------- ----------- -----------
1 Mark Male 5000 62000
2 John Male 4500 62000
3 Pavan Male 5000 62000
4 Pam Female 5500 62000
5 Sara Female 4000 62000
6 Aradhya Female 3500 62000
7 Tom Male 5500 62000
8 Mary Female 5000 62000
9 Ben Male 6500 62000
10 Jodi Female 7000 62000
11 Tom Male 5500 62000
12 Ron Male 5000 62000
Select *,Sum(salary) Over() as sum_sal from employees
Id Name Gender Salary sum_sal
----------- -------------------------------------------------- ---------- ----------- -----------
1 Mark Male 5000 62000
2 John Male 4500 62000
3 Pavan Male 5000 62000
4 Pam Female 5500 62000
5 Sara Female 4000 62000
6 Aradhya Female 3500 62000
7 Tom Male 5500 62000
8 Mary Female 5000 62000
9 Ben Male 6500 62000
10 Jodi Female 7000 62000
11 Tom Male 5500 62000
12 Ron Male 5000 62000