Sql server 递归CTE的性能调优
我有下表和样本数据:Sql server 递归CTE的性能调优,sql-server,sql-server-2012,recursive-query,Sql Server,Sql Server 2012,Recursive Query,我有下表和样本数据: insert into tbl_nodes values('Node1','Node2'); insert into tbl_nodes values('Node2','Node4'); insert into tbl_nodes values('Node2','Node3'); insert into tbl_nodes values('Node2','Node5'); insert into tbl_nodes values('Node3','Node5');
insert into tbl_nodes values('Node1','Node2');
insert into tbl_nodes values('Node2','Node4');
insert into tbl_nodes values('Node2','Node3');
insert into tbl_nodes values('Node2','Node5');
insert into tbl_nodes values('Node3','Node5');
insert into tbl_nodes values('Node3','Node6');
insert into tbl_nodes values('Node6','Node7');
insert into tbl_nodes values('Node10','Node11');
insert into tbl_nodes values('Node6','Node8');
insert into tbl_nodes values('Node18','Node19');
insert into tbl_nodes values('Node9','Node10');
insert into tbl_nodes values('Node12','Node13');
insert into tbl_nodes values('Node15','Node16');
表:tbl_节点
create table tbl_nodes
(
nod1 varchar(50),
nod2 varchar(50)
);
样本数据:
insert into tbl_nodes values('Node1','Node2');
insert into tbl_nodes values('Node2','Node4');
insert into tbl_nodes values('Node2','Node3');
insert into tbl_nodes values('Node2','Node5');
insert into tbl_nodes values('Node3','Node5');
insert into tbl_nodes values('Node3','Node6');
insert into tbl_nodes values('Node6','Node7');
insert into tbl_nodes values('Node10','Node11');
insert into tbl_nodes values('Node6','Node8');
insert into tbl_nodes values('Node18','Node19');
insert into tbl_nodes values('Node9','Node10');
insert into tbl_nodes values('Node12','Node13');
insert into tbl_nodes values('Node15','Node16');
注意:我在上表中有超过5000条记录
预期结果:
------------------------------------
Connectivity
------------------------------------
Node1->Node2->Node3->Node5
Node1->Node2->Node3->Node6->Node7
Node1->Node2->Node3->Node6->Node8
Node1->Node2->Node4
Node1->Node2->Node5
Node9->Node10->Node11
;WITH CTE AS
(
SELECT nod1,nod2,
CAST(nod1 AS VARCHAR(MAX))+'->' AS conn,
1 as lvl
from tbl_nodes T1
where EXISTS (select 1 from tbl_nodes T2 where T1.nod2 =T2.nod1) OR
EXISTS (select 1 from tbl_nodes T3 WHERE T1.nod1 =T3.nod2)
UNION ALL
SELECT C1.nod1,C1.nod2,
C.conn+CAST(C1.nod1 AS VARCHAR(MAX))+'->',
c.lvl+1
FROM CTE C INNER JOIN tbl_nodes C1 ON C.nod2 = C1.nod1
WHERE CHARINDEX(','+C.nod2+',',C.conn)=0
),cte2 as
(
select * , ROW_NUMBER() over (partition by nod1,nod2 order by lvl)as rn From CTE
),cte3 as
(
select nod1,nod2 ,MAX(LEN(conn)) conn,MAX(rn) rn
from cte2
group by nod1,nod2
)
SELECT DISTINCT c2.conn+c3.nod2 AS Connectivity
from cte3 c3
inner join cte2 c2 on c3.rn = c2.rn and c3.nod1 = c2.nod1
where c3.nod2 not in (select nod1 from cte2)
关于预期结果的解释:我想找到具有2个以上节点的节点之间的连接,
例如,Node1
与Node2
连接,而Node2
与3,4,5等连接,如预期结果集所示。
并希望显示每个连接,直到找到结束节点,例如,结束节点是Node4
,Node5
,Node7
,Node8
和Node11
我尝试了以下查询:
我的尝试:
------------------------------------
Connectivity
------------------------------------
Node1->Node2->Node3->Node5
Node1->Node2->Node3->Node6->Node7
Node1->Node2->Node3->Node6->Node8
Node1->Node2->Node4
Node1->Node2->Node5
Node9->Node10->Node11
;WITH CTE AS
(
SELECT nod1,nod2,
CAST(nod1 AS VARCHAR(MAX))+'->' AS conn,
1 as lvl
from tbl_nodes T1
where EXISTS (select 1 from tbl_nodes T2 where T1.nod2 =T2.nod1) OR
EXISTS (select 1 from tbl_nodes T3 WHERE T1.nod1 =T3.nod2)
UNION ALL
SELECT C1.nod1,C1.nod2,
C.conn+CAST(C1.nod1 AS VARCHAR(MAX))+'->',
c.lvl+1
FROM CTE C INNER JOIN tbl_nodes C1 ON C.nod2 = C1.nod1
WHERE CHARINDEX(','+C.nod2+',',C.conn)=0
),cte2 as
(
select * , ROW_NUMBER() over (partition by nod1,nod2 order by lvl)as rn From CTE
),cte3 as
(
select nod1,nod2 ,MAX(LEN(conn)) conn,MAX(rn) rn
from cte2
group by nod1,nod2
)
SELECT DISTINCT c2.conn+c3.nod2 AS Connectivity
from cte3 c3
inner join cte2 c2 on c3.rn = c2.rn and c3.nod1 = c2.nod1
where c3.nod2 not in (select nod1 from cte2)
上述查询工作正常,但无法获得5000条以上记录的结果,查询将继续运行,但没有结果
编辑:我无法附加运行数据,因为它包含敏感信息,但会解释!我有一个带有列
Name1
和Name2
的表,我称之为Nod1
和Nod2
。我想找出名称之间的关系,就像我们在给定示例中找到节点之间的链接一样。第一个人(Name1
)可能已与第二个人(Name2
)进行了一些交易,Name2
可能必须与任何其他人进行交易。所以我需要找出这些人之间交易的联系。它与给定的示例完全相同。我试着对给定的查询进行分区,对于100条记录,它在几秒钟内到达,对于500条记录,它花费了1分钟,对于5000条记录,它保持运行,因为有更多的排列和组合。问题是最后的数据集(5000)我们必须找出链接 这里是使用EXISTS
运算符的递归查询的简化版本:
WITH cte_nodes AS (
SELECT CAST(nod1 + '->' + nod2 AS VARCHAR(4000)) AS path, nod2
FROM tbl_nodes AS root
WHERE NOT EXISTS (
-- no parent exists thus represents a root node
SELECT 1
FROM tbl_nodes
WHERE nod2 = root.nod1
) AND EXISTS (
-- at least one child exists thus connected with at least one node
SELECT 1
FROM tbl_nodes
WHERE nod1 = root.nod2
)
UNION ALL
SELECT CAST(prnt.path + '->' + chld.nod2 AS VARCHAR(4000)), chld.nod2
FROM cte_nodes AS prnt
JOIN tbl_nodes AS chld ON prnt.nod2 = chld.nod1
)
SELECT path
FROM cte_nodes
WHERE NOT EXISTS (
-- no child exists thus represents a leaf node
SELECT 1
FROM tbl_nodes
WHERE nod1 = cte_nodes.nod2
)
ORDER BY path
OPTION (MAXRECURSION 100) -- increase this value just enough to get the results
下面是使用
EXISTS
运算符的递归查询的简化版本:
WITH cte_nodes AS (
SELECT CAST(nod1 + '->' + nod2 AS VARCHAR(4000)) AS path, nod2
FROM tbl_nodes AS root
WHERE NOT EXISTS (
-- no parent exists thus represents a root node
SELECT 1
FROM tbl_nodes
WHERE nod2 = root.nod1
) AND EXISTS (
-- at least one child exists thus connected with at least one node
SELECT 1
FROM tbl_nodes
WHERE nod1 = root.nod2
)
UNION ALL
SELECT CAST(prnt.path + '->' + chld.nod2 AS VARCHAR(4000)), chld.nod2
FROM cte_nodes AS prnt
JOIN tbl_nodes AS chld ON prnt.nod2 = chld.nod1
)
SELECT path
FROM cte_nodes
WHERE NOT EXISTS (
-- no child exists thus represents a leaf node
SELECT 1
FROM tbl_nodes
WHERE nod1 = cte_nodes.nod2
)
ORDER BY path
OPTION (MAXRECURSION 100) -- increase this value just enough to get the results
关于这个问题,有两个问题需要解决:
IF OBJECT_ID('tempdb..#tbl_nodes') IS NOT NULL
DROP TABLE #tbl_nodes;
CREATE TABLE #tbl_nodes (
nod1 VARCHAR(50)
, nod2 VARCHAR(50)
);
CREATE NONCLUSTERED INDEX #IX_tbl_nodes_1 ON #tbl_nodes (nod1, nod2);
CREATE NONCLUSTERED INDEX #IX_tbl_nodes_2 ON #tbl_nodes (nod2, nod1);
INSERT INTO #tbl_nodes (nod1, nod2)
VALUES ('Node1','Node2')
, ('Node2','Node4')
, ('Node2','Node3')
, ('Node2','Node5')
, ('Node3','Node5')
, ('Node3','Node6')
, ('Node6','Node7')
, ('Node10','Node11')
, ('Node6','Node8')
, ('Node18','Node19')
, ('Node9','Node10')
, ('Node12','Node13')
, ('Node15','Node16')
, ('Node8', 'Node3')
;
WITH cte AS (
SELECT parent.nod1, parent.nod2
, [link] = CAST('[' + parent.nod1 + '] -> [' + parent.nod2 + ']' AS VARCHAR(MAX))
, [flag] = f.flag
, [loop] = 0
, [stop] = 0
, [nodes] = 2
FROM #tbl_nodes AS parent
LEFT JOIN #tbl_nodes AS child
ON parent.nod1 = child.nod2
CROSS APPLY (
SELECT _f.flag, [rn] = ROW_NUMBER() OVER(ORDER BY _f.flag ASC)
FROM (
SELECT [flag] = CAST(1 AS BIT)
UNION ALL
SELECT [flag] = CAST(0 AS BIT)
FROM #tbl_nodes AS __f
WHERE parent.nod2 = __f.nod1
) AS _f
) AS f
WHERE child.nod2 IS NULL
AND f.rn = 1
UNION ALL
SELECT parent.nod1, child.nod2
, [link] = CAST(parent.link + ' -> [' + child.nod2 + ']' AS VARCHAR(MAX))
, [flag] = f.flag
, [loop] = l.loop
, [stop] = l.stop
, [nodes] = parent.nodes + 1
FROM cte AS parent
CROSS APPLY (
SELECT _child.nod1, _child.nod2, [rn] = ROW_NUMBER() OVER(PARTITION BY _child.nod2 ORDER BY _child.nod2)
FROM #tbl_nodes AS _child
WHERE parent.nod2 = _child.nod1
) AS child
CROSS APPLY (
SELECT _f.flag, [rn] = ROW_NUMBER() OVER(ORDER BY _f.flag ASC)
FROM (
SELECT [flag] = CAST(1 AS BIT)
UNION ALL
SELECT [flag] = CAST(0 AS BIT)
FROM #tbl_nodes AS __f
WHERE child.nod2 = __f.nod1
) AS _f
) AS f
CROSS APPLY (
SELECT [loop] = CASE WHEN (LEN(parent.link + ' -> [' + child.nod2 + ']') - LEN(REPLACE(parent.link + ' -> [' + child.nod2 + ']', '[' + child.nod2 + ']', ''))) / (LEN(child.nod2) + 2) > 1 THEN 1 ELSE 0 END
, [stop] = CASE WHEN (LEN(parent.link) - LEN(REPLACE(parent.link, '[' + parent.nod2 + ']', ''))) / (LEN(parent.nod2) + 2) > 1 THEN 1 ELSE 0 END
) AS l
WHERE child.rn = 1
AND f.rn = 1
AND l.stop = 0
)
SELECT cte.link, cte.loop
FROM cte
WHERE (cte.flag = 1 OR cte.loop = 1)
AND cte.nodes > 2
ORDER BY cte.nod1
OPTION (MAXRECURSION 0);
干杯
更新:根据@MAK的请求,我更新了我的答案,以获得包含2个以上节点的路径。关于此问题,有两个问题需要解决:
IF OBJECT_ID('tempdb..#tbl_nodes') IS NOT NULL
DROP TABLE #tbl_nodes;
CREATE TABLE #tbl_nodes (
nod1 VARCHAR(50)
, nod2 VARCHAR(50)
);
CREATE NONCLUSTERED INDEX #IX_tbl_nodes_1 ON #tbl_nodes (nod1, nod2);
CREATE NONCLUSTERED INDEX #IX_tbl_nodes_2 ON #tbl_nodes (nod2, nod1);
INSERT INTO #tbl_nodes (nod1, nod2)
VALUES ('Node1','Node2')
, ('Node2','Node4')
, ('Node2','Node3')
, ('Node2','Node5')
, ('Node3','Node5')
, ('Node3','Node6')
, ('Node6','Node7')
, ('Node10','Node11')
, ('Node6','Node8')
, ('Node18','Node19')
, ('Node9','Node10')
, ('Node12','Node13')
, ('Node15','Node16')
, ('Node8', 'Node3')
;
WITH cte AS (
SELECT parent.nod1, parent.nod2
, [link] = CAST('[' + parent.nod1 + '] -> [' + parent.nod2 + ']' AS VARCHAR(MAX))
, [flag] = f.flag
, [loop] = 0
, [stop] = 0
, [nodes] = 2
FROM #tbl_nodes AS parent
LEFT JOIN #tbl_nodes AS child
ON parent.nod1 = child.nod2
CROSS APPLY (
SELECT _f.flag, [rn] = ROW_NUMBER() OVER(ORDER BY _f.flag ASC)
FROM (
SELECT [flag] = CAST(1 AS BIT)
UNION ALL
SELECT [flag] = CAST(0 AS BIT)
FROM #tbl_nodes AS __f
WHERE parent.nod2 = __f.nod1
) AS _f
) AS f
WHERE child.nod2 IS NULL
AND f.rn = 1
UNION ALL
SELECT parent.nod1, child.nod2
, [link] = CAST(parent.link + ' -> [' + child.nod2 + ']' AS VARCHAR(MAX))
, [flag] = f.flag
, [loop] = l.loop
, [stop] = l.stop
, [nodes] = parent.nodes + 1
FROM cte AS parent
CROSS APPLY (
SELECT _child.nod1, _child.nod2, [rn] = ROW_NUMBER() OVER(PARTITION BY _child.nod2 ORDER BY _child.nod2)
FROM #tbl_nodes AS _child
WHERE parent.nod2 = _child.nod1
) AS child
CROSS APPLY (
SELECT _f.flag, [rn] = ROW_NUMBER() OVER(ORDER BY _f.flag ASC)
FROM (
SELECT [flag] = CAST(1 AS BIT)
UNION ALL
SELECT [flag] = CAST(0 AS BIT)
FROM #tbl_nodes AS __f
WHERE child.nod2 = __f.nod1
) AS _f
) AS f
CROSS APPLY (
SELECT [loop] = CASE WHEN (LEN(parent.link + ' -> [' + child.nod2 + ']') - LEN(REPLACE(parent.link + ' -> [' + child.nod2 + ']', '[' + child.nod2 + ']', ''))) / (LEN(child.nod2) + 2) > 1 THEN 1 ELSE 0 END
, [stop] = CASE WHEN (LEN(parent.link) - LEN(REPLACE(parent.link, '[' + parent.nod2 + ']', ''))) / (LEN(parent.nod2) + 2) > 1 THEN 1 ELSE 0 END
) AS l
WHERE child.rn = 1
AND f.rn = 1
AND l.stop = 0
)
SELECT cte.link, cte.loop
FROM cte
WHERE (cte.flag = 1 OR cte.loop = 1)
AND cte.nodes > 2
ORDER BY cte.nod1
OPTION (MAXRECURSION 0);
干杯
更新:根据@MAK的请求,我更新了我的答案,以获得包含2个以上节点的路径。当您没有得到结果时,是否还会收到错误消息?常用表表达式的递归深度限制默认为100。您可以通过在最外层的select语句中添加一个“OPTION(MAXRECURSION…”提示来改变这一点。e、 g:在没有图形表的SQL Server上,这是一项相当困难的任务。如果您想要一个替代解决方案,请尝试以下解决方案:您是否收到有关最大递归限制的错误,或者查询速度太慢?@SalmanA,在8分钟后收到此错误
Msg 1105,级别17,状态2,第1行无法为数据库“tempdb”中的对象“dbo.Large object存储系统对象:”分配空间,因为“PRIMARY”文件组已满。通过删除不需要的文件、删除文件组中的对象、向文件组添加其他文件或为文件组中的现有文件启用自动增长来创建磁盘空间。Msg 9002,17级,状态4,第1行数据库“tempdb”的事务日志已满。要了解日志中的空间无法重用的原因,请参阅sys.databases中的log_reuse_wait_desc列。我想你的空间已经用完了。当你没有结果时,你是否也会收到一条错误消息?常用表表达式的递归深度限制默认为100。您可以通过在最外层的select语句中添加一个“OPTION(MAXRECURSION…”提示来改变这一点。e、 g:在没有图形表的SQL Server上,这是一项相当困难的任务。如果您想要一个替代解决方案,请尝试以下解决方案:您是否收到有关最大递归限制的错误,或者查询速度太慢?@SalmanA,在8分钟后收到此错误Msg 1105,级别17,状态2,第1行无法为数据库“tempdb”中的对象“dbo.Large object存储系统对象:”分配空间,因为“PRIMARY”文件组已满。通过删除不需要的文件、删除文件组中的对象、向文件组添加其他文件或为文件组中的现有文件启用自动增长来创建磁盘空间。Msg 9002,17级,状态4,第1行数据库“tempdb”的事务日志已满。要了解日志中的空间无法重用的原因,请参阅sys.databases中的log_reuse_wait_desc列。我想您的空间已用完。语句终止时出现错误。在语句完成之前,最大递归100已用尽。
假设图中没有循环(Node1>Node2>Node3>Node1),然后在末尾添加选项(MAXRECURSION 500)
。实际上,您可以使用更大的值(甚至0),但我建议尝试使用更小的值,直到它起作用为止。他们在上一个问题中确实有循环引用,它引用了表tbl_节点
,@SalmanA。这里的示例没有,但不幸的是,这并不意味着它们的生产数据没有。获取错误语句终止。在语句完成之前,最大递归100已用尽。
假设图中没有循环(Node1>Node2>Node3>Node1),然后在末尾添加选项(MAXRECURSION 500)
。实际上,你可以使用一个更大的值(甚至0),但我建议尝试使用更小的值,直到它起作用为止