获取SQL Server中GROUP BY之后的groups项
我有一个包含以下列的表:获取SQL Server中GROUP BY之后的groups项,sql,sql-server,tsql,Sql,Sql Server,Tsql,我有一个包含以下列的表: UserID1, UserID2, ProductID, PurchaseDate 以下查询在purchases表中运行,并返回两个用户,这些用户之间有多次交互,而不管过去31天的顺序如何: DECLARE @threshold AS INT DECLARE @days AS INT SET @threshold = 10 SET @days = 31 SELECT UserID1, UserID2, COUNT(*) AS Counter FROM
UserID1, UserID2, ProductID, PurchaseDate
以下查询在purchases表中运行,并返回两个用户,这些用户之间有多次交互,而不管过去31天的顺序如何:
DECLARE @threshold AS INT
DECLARE @days AS INT
SET @threshold = 10
SET @days = 31
SELECT
UserID1, UserID2, COUNT(*) AS Counter
FROM
(SELECT
--do this to revert columns and count as one case both Col1,Col2 and Col2,Col1
CASE
WHEN UserID1 < UserID2
THEN UserID1
ELSE UserID2
END AS UserID1,
CASE
WHEN UserID1 < UserID2
THEN UserID2
ELSE UserID1
END AS UserID2
FROM
Purchases WITH(NOLOCK)
WHERE
Deadline BETWEEN DATEADD(day, -@days, GETDATE()) AND GETDATE()) t
GROUP BY
UserID1, UserID2
HAVING
COUNT(*) > @threshold
但是,我想要的是返回一个表,其中ProductID和PurchaseDate在单独的行中,如下所示
UserID1 UserID2 ProductID PurchaseDate
1 2 12345 2017-01-18 00:13:52
1 2 5425 2017-01-12 15:10:02
1 2 64362 2017-01-05 10:10:02
..... for the 10 interactions
3 2 25235 2017-01-18 00:13:52
3 2 436346 2017-01-14 00:13:52
..... for the 5 interactions
4 1 23523 2017-01-14 00:13:52
4 1 135135 2017-01-09 00:13:52
..... for the 8 interactions
有没有办法不将第一个查询的结果放在临时表中,然后再次将其与采购表连接以查找所有采购?免责声明:我没有测试代码,它是在T-SQL IDE之外编写的。
下面的代码基于以下假设:UserID1!=UserID2
1我建议使用最大/最小值解决方案以与[Col2,Col1]相同的方式处理[Col1,Col2]。它的性能可能会更好,并且能够正确地处理空值。您需要SQL Server 2008或更高版本才能使其正常工作
SELECT
(SELECT MAX(usr) FROM (VALUES (UserID1), (UserID2) as User(usr)) as UserID1,
(SELECT MIN(usr) FROM (VALUES (UserID1), (UserID2) as User(usr)) as UserID2
FROM
Purchases
现在我们需要计算它们之间的相互作用,这应该很容易。为了保持代码干净,我们可以在前面的语句中使用CTE,我在这里添加了截止日期过滤器:
;WITH CTE_UserInteractions AS (
SELECT
(SELECT MAX(usr) FROM (VALUES (UserID1), (UserID2) as User(usr)) as FirstUser,
(SELECT MIN(usr) FROM (VALUES (UserID1), (UserID2) as User(usr)) as SecondUser
FROM
Purchases
WHERE
Deadline BETWEEN DATEADD(day,-@days,GETDATE()) AND GETDATE()
)
SELECT
FirstUser,
SecondUser
FROM
CTE_UserInteractions
GROUP BY
FirstUser, SecondUser
HAVING
COUNT(*) > @Threshold
这里请注意:您可能会发现提前计算左截止日期边界会对性能产生积极影响。例如,在运行批处理之前,我们可以执行以下操作:
DECLARE @StartDate DATETIME = DATEADD(DAY,-@days,GETDATE())
然后我们可以在WHERE子句中使用@StartDate
3最后,我们可以使用CROSS APPLY获得结果为用户对留下的产品和购买清单。若性能受到影响,我们可以使用“选择我的解决方案”子列表,也可以使用步骤2的结果预填充临时表
;WITH CTE_UserInteractions AS (
SELECT
(SELECT MAX(usr) FROM (VALUES (UserID1), (UserID2) as User(usr)) as FirstUser,
(SELECT MIN(usr) FROM (VALUES (UserID1), (UserID2) as User(usr)) as SecondUser
FROM
Purchases AS p1
WHERE
Deadline BETWEEN DATEADD(day,-@days,GETDATE()) AND GETDATE()
)
SELECT
groupedUsers.FirstUser as UserID1,
groupedUsers.SecondUser as UserID2,
products.ProductID,
products.PurchaseDate
FROM (
SELECT
FirstUser,
SecondUser
FROM
CTE_UserInteractions
GROUP BY
FirstUser, SecondUser
HAVING
COUNT(*) > @Threshold
) groupedUsers
CROSS APPLY (
SELECT
ProductID, PurchaseDate
FROM
Purchases AS p1
WHERE
p1.UserID1 = FirstUser AND p1.UserID2 = SecondUser
UNION ALL
SELECT
ProductID, PurchaseDate
FROM
Purchases AS p2
WHERE
p2.UserID2 = FirstUser AND p2.UserID1 = SecondUser
) products
如果我理解正确,那么简单的窗口计数在这里会有所帮助 乐观主义者应该足够聪明,只需扫描一次表就可以做到这一点
DECLARE @threshold AS INT;
DECLARE @days AS INT;
SET @threshold = 10;
SET @days = 31;
WITH
CTE_Purchases
AS
(
SELECT
--do this to revert columns and count as one case both Col1,Col2 and Col2,Col1
CASE
WHEN UserID1 < UserID2
THEN UserID1
ELSE UserID2
END AS UserID1
,CASE
WHEN UserID1 < UserID2
THEN UserID2
ELSE UserID1
END AS UserID2
,ProductID
,PurchaseDate
FROM
Purchases
WHERE
Deadline BETWEEN DATEADD(day, -@days, GETDATE()) AND GETDATE()
)
,CTE_Counts
AS
(
SELECT
UserID1
,UserID2
,ProductID
,PurchaseDate
,COUNT(*) OVER (PARTITION BY UserID1, UserID2) AS Counter
-- calc COUNT for groups without explicit GROUP BY
FROM CTE_Purchases
)
SELECT
UserID1
,UserID2
,ProductID
,PurchaseDate
,Counter
FROM CTE_Counts
WHERE Counter > @threshold
-- this filter is instead of your HAVING
;
你能不能先发布你的源表结构,一些示例,然后是完整的期望输出?看起来好像你正在完成一个解决方案的一半,并要求我们完成它,而一个完全不同的解决方案可能更合适。哇。它起作用了。老实说,我想不起来。仍然没有在CTE上挖掘足够的信息,似乎我也有:
DECLARE @threshold AS INT;
DECLARE @days AS INT;
SET @threshold = 10;
SET @days = 31;
WITH
CTE_Purchases
AS
(
SELECT
--do this to revert columns and count as one case both Col1,Col2 and Col2,Col1
CASE
WHEN UserID1 < UserID2
THEN UserID1
ELSE UserID2
END AS UserID1
,CASE
WHEN UserID1 < UserID2
THEN UserID2
ELSE UserID1
END AS UserID2
,ProductID
,PurchaseDate
FROM
Purchases
WHERE
Deadline BETWEEN DATEADD(day, -@days, GETDATE()) AND GETDATE()
)
,CTE_Counts
AS
(
SELECT
UserID1
,UserID2
,ProductID
,PurchaseDate
,COUNT(*) OVER (PARTITION BY UserID1, UserID2) AS Counter
-- calc COUNT for groups without explicit GROUP BY
FROM CTE_Purchases
)
SELECT
UserID1
,UserID2
,ProductID
,PurchaseDate
,Counter
FROM CTE_Counts
WHERE Counter > @threshold
-- this filter is instead of your HAVING
;