Sql server 递归CTE的性能调优

Sql server 递归CTE的性能调优,sql-server,sql-server-2012,recursive-query,Sql Server,Sql Server 2012,Recursive Query,我有下表和样本数据: insert into tbl_nodes values('Node1','Node2'); insert into tbl_nodes values('Node2','Node4'); insert into tbl_nodes values('Node2','Node3'); insert into tbl_nodes values('Node2','Node5'); insert into tbl_nodes values('Node3','Node5');

我有下表和样本数据:

insert into tbl_nodes values('Node1','Node2'); 
insert into tbl_nodes values('Node2','Node4'); 
insert into tbl_nodes values('Node2','Node3'); 
insert into tbl_nodes values('Node2','Node5'); 
insert into tbl_nodes values('Node3','Node5'); 
insert into tbl_nodes values('Node3','Node6'); 
insert into tbl_nodes values('Node6','Node7'); 
insert into tbl_nodes values('Node10','Node11');
insert into tbl_nodes values('Node6','Node8');
insert into tbl_nodes values('Node18','Node19');
insert into tbl_nodes values('Node9','Node10');
insert into tbl_nodes values('Node12','Node13');
insert into tbl_nodes values('Node15','Node16');
表:
tbl_节点

create table tbl_nodes
(
    nod1 varchar(50),
    nod2 varchar(50)
);
样本数据:

insert into tbl_nodes values('Node1','Node2'); 
insert into tbl_nodes values('Node2','Node4'); 
insert into tbl_nodes values('Node2','Node3'); 
insert into tbl_nodes values('Node2','Node5'); 
insert into tbl_nodes values('Node3','Node5'); 
insert into tbl_nodes values('Node3','Node6'); 
insert into tbl_nodes values('Node6','Node7'); 
insert into tbl_nodes values('Node10','Node11');
insert into tbl_nodes values('Node6','Node8');
insert into tbl_nodes values('Node18','Node19');
insert into tbl_nodes values('Node9','Node10');
insert into tbl_nodes values('Node12','Node13');
insert into tbl_nodes values('Node15','Node16');
注意:我在上表中有超过5000条记录

预期结果

------------------------------------
Connectivity        
------------------------------------
Node1->Node2->Node3->Node5
Node1->Node2->Node3->Node6->Node7
Node1->Node2->Node3->Node6->Node8
Node1->Node2->Node4
Node1->Node2->Node5
Node9->Node10->Node11
;WITH CTE AS
(
    SELECT  nod1,nod2,
            CAST(nod1 AS VARCHAR(MAX))+'->' AS conn, 
            1 as lvl 
    from tbl_nodes T1
    where EXISTS (select 1 from tbl_nodes T2 where T1.nod2 =T2.nod1) OR 
    EXISTS (select 1 from tbl_nodes T3 WHERE T1.nod1 =T3.nod2)
    UNION ALL
    SELECT C1.nod1,C1.nod2,
           C.conn+CAST(C1.nod1 AS VARCHAR(MAX))+'->',
           c.lvl+1 
    FROM CTE C INNER JOIN tbl_nodes C1 ON  C.nod2 = C1.nod1
    WHERE CHARINDEX(','+C.nod2+',',C.conn)=0 
),cte2 as 
(
    select * , ROW_NUMBER() over (partition by nod1,nod2 order by lvl)as rn From CTE
),cte3 as
(
    select nod1,nod2 ,MAX(LEN(conn)) conn,MAX(rn) rn
    from cte2
    group by nod1,nod2
)
SELECT DISTINCT c2.conn+c3.nod2 AS Connectivity
from cte3 c3
inner join cte2 c2 on c3.rn = c2.rn and c3.nod1 = c2.nod1
where c3.nod2 not in (select nod1 from cte2)
关于预期结果的解释:我想找到具有2个以上节点的节点之间的连接, 例如,
Node1
Node2
连接,而
Node2
与3,4,5等连接,如预期结果集所示。 并希望显示每个连接,直到找到结束节点,例如,结束节点是
Node4
Node5
Node7
Node8
Node11

我尝试了以下查询:

我的尝试

------------------------------------
Connectivity        
------------------------------------
Node1->Node2->Node3->Node5
Node1->Node2->Node3->Node6->Node7
Node1->Node2->Node3->Node6->Node8
Node1->Node2->Node4
Node1->Node2->Node5
Node9->Node10->Node11
;WITH CTE AS
(
    SELECT  nod1,nod2,
            CAST(nod1 AS VARCHAR(MAX))+'->' AS conn, 
            1 as lvl 
    from tbl_nodes T1
    where EXISTS (select 1 from tbl_nodes T2 where T1.nod2 =T2.nod1) OR 
    EXISTS (select 1 from tbl_nodes T3 WHERE T1.nod1 =T3.nod2)
    UNION ALL
    SELECT C1.nod1,C1.nod2,
           C.conn+CAST(C1.nod1 AS VARCHAR(MAX))+'->',
           c.lvl+1 
    FROM CTE C INNER JOIN tbl_nodes C1 ON  C.nod2 = C1.nod1
    WHERE CHARINDEX(','+C.nod2+',',C.conn)=0 
),cte2 as 
(
    select * , ROW_NUMBER() over (partition by nod1,nod2 order by lvl)as rn From CTE
),cte3 as
(
    select nod1,nod2 ,MAX(LEN(conn)) conn,MAX(rn) rn
    from cte2
    group by nod1,nod2
)
SELECT DISTINCT c2.conn+c3.nod2 AS Connectivity
from cte3 c3
inner join cte2 c2 on c3.rn = c2.rn and c3.nod1 = c2.nod1
where c3.nod2 not in (select nod1 from cte2)
上述查询工作正常,但无法获得5000条以上记录的结果,查询将继续运行,但没有结果


编辑:我无法附加运行数据,因为它包含敏感信息,但会解释!我有一个带有列
Name1
Name2
的表,我称之为
Nod1
Nod2
。我想找出名称之间的关系,就像我们在给定示例中找到节点之间的链接一样。第一个人(
Name1
)可能已与第二个人(
Name2
)进行了一些交易,
Name2
可能必须与任何其他人进行交易。所以我需要找出这些人之间交易的联系。它与给定的示例完全相同。我试着对给定的查询进行分区,对于100条记录,它在几秒钟内到达,对于500条记录,它花费了1分钟,对于5000条记录,它保持运行,因为有更多的排列和组合。问题是最后的数据集(5000)我们必须找出链接

这里是使用
EXISTS
运算符的递归查询的简化版本:

WITH cte_nodes AS (
    SELECT CAST(nod1 + '->' + nod2 AS VARCHAR(4000)) AS path, nod2
    FROM tbl_nodes AS root
    WHERE NOT EXISTS (
        -- no parent exists thus represents a root node
        SELECT 1
        FROM tbl_nodes
        WHERE nod2 = root.nod1
    ) AND EXISTS (
        -- at least one child exists thus connected with at least one node
        SELECT 1
        FROM tbl_nodes
        WHERE nod1 = root.nod2
    )
    UNION ALL
    SELECT CAST(prnt.path + '->' + chld.nod2 AS VARCHAR(4000)), chld.nod2
    FROM cte_nodes AS prnt
    JOIN tbl_nodes AS chld ON prnt.nod2 = chld.nod1
)
SELECT path
FROM cte_nodes
WHERE NOT EXISTS (
    -- no child exists thus represents a leaf node
    SELECT 1
    FROM tbl_nodes
    WHERE nod1 = cte_nodes.nod2
)
ORDER BY path
OPTION (MAXRECURSION 100) -- increase this value just enough to get the results

下面是使用
EXISTS
运算符的递归查询的简化版本:

WITH cte_nodes AS (
    SELECT CAST(nod1 + '->' + nod2 AS VARCHAR(4000)) AS path, nod2
    FROM tbl_nodes AS root
    WHERE NOT EXISTS (
        -- no parent exists thus represents a root node
        SELECT 1
        FROM tbl_nodes
        WHERE nod2 = root.nod1
    ) AND EXISTS (
        -- at least one child exists thus connected with at least one node
        SELECT 1
        FROM tbl_nodes
        WHERE nod1 = root.nod2
    )
    UNION ALL
    SELECT CAST(prnt.path + '->' + chld.nod2 AS VARCHAR(4000)), chld.nod2
    FROM cte_nodes AS prnt
    JOIN tbl_nodes AS chld ON prnt.nod2 = chld.nod1
)
SELECT path
FROM cte_nodes
WHERE NOT EXISTS (
    -- no child exists thus represents a leaf node
    SELECT 1
    FROM tbl_nodes
    WHERE nod1 = cte_nodes.nod2
)
ORDER BY path
OPTION (MAXRECURSION 100) -- increase this value just enough to get the results

关于这个问题,有两个问题需要解决:

  • 删除尚未结束的路径
  • Detect循环(它无限地导致递归cte循环)
  • 下面是我自己的答案:

    IF  OBJECT_ID('tempdb..#tbl_nodes') IS NOT NULL
        DROP TABLE  #tbl_nodes;
    CREATE TABLE    #tbl_nodes (
                        nod1    VARCHAR(50)
                    ,   nod2    VARCHAR(50)
                    );
    
    CREATE NONCLUSTERED INDEX #IX_tbl_nodes_1 ON #tbl_nodes (nod1, nod2);
    CREATE NONCLUSTERED INDEX #IX_tbl_nodes_2 ON #tbl_nodes (nod2, nod1);
    
    INSERT INTO #tbl_nodes (nod1, nod2)
    VALUES  ('Node1','Node2')
        ,   ('Node2','Node4')
        ,   ('Node2','Node3')
        ,   ('Node2','Node5')
        ,   ('Node3','Node5')
        ,   ('Node3','Node6')
        ,   ('Node6','Node7')
        ,   ('Node10','Node11')
        ,   ('Node6','Node8')
        ,   ('Node18','Node19')
        ,   ('Node9','Node10')
        ,   ('Node12','Node13')
        ,   ('Node15','Node16')
        ,   ('Node8', 'Node3')
        ;
    
    
    WITH cte AS (
        SELECT  parent.nod1, parent.nod2
            ,   [link] = CAST('[' + parent.nod1 + '] -> [' + parent.nod2 + ']' AS VARCHAR(MAX))
            ,   [flag] = f.flag
            ,   [loop] = 0
            ,   [stop] = 0
            ,   [nodes] = 2
        FROM        #tbl_nodes AS parent
        LEFT JOIN   #tbl_nodes AS child
                ON  parent.nod1 = child.nod2
        CROSS APPLY (
            SELECT  _f.flag, [rn] = ROW_NUMBER() OVER(ORDER BY _f.flag ASC)
            FROM    (
                SELECT  [flag] = CAST(1 AS BIT)
                UNION ALL
                SELECT  [flag] = CAST(0 AS BIT)
                FROM    #tbl_nodes AS __f
                WHERE   parent.nod2 = __f.nod1
            ) AS _f
        ) AS f
        WHERE   child.nod2 IS NULL
            AND f.rn = 1
        UNION ALL
        SELECT  parent.nod1, child.nod2
            ,   [link] = CAST(parent.link + ' -> [' + child.nod2 + ']' AS VARCHAR(MAX))
            ,   [flag] = f.flag
            ,   [loop] = l.loop
            ,   [stop] = l.stop
            ,   [nodes] = parent.nodes + 1
        FROM        cte AS parent
        CROSS APPLY (
            SELECT  _child.nod1, _child.nod2, [rn] = ROW_NUMBER() OVER(PARTITION BY _child.nod2 ORDER BY _child.nod2)
            FROM    #tbl_nodes AS _child
            WHERE   parent.nod2 = _child.nod1
        ) AS child
        CROSS APPLY (
            SELECT  _f.flag, [rn] = ROW_NUMBER() OVER(ORDER BY _f.flag ASC)
            FROM    (
                SELECT  [flag] = CAST(1 AS BIT)
                UNION ALL
                SELECT  [flag] = CAST(0 AS BIT)
                FROM    #tbl_nodes AS __f
                WHERE   child.nod2 = __f.nod1
            ) AS _f
        ) AS f
        CROSS APPLY (
            SELECT  [loop] = CASE WHEN (LEN(parent.link + ' -> [' + child.nod2 + ']') - LEN(REPLACE(parent.link + ' -> [' + child.nod2 + ']', '[' + child.nod2 + ']', ''))) / (LEN(child.nod2) + 2) > 1 THEN 1 ELSE 0 END
                ,   [stop] = CASE WHEN (LEN(parent.link) - LEN(REPLACE(parent.link, '[' + parent.nod2 + ']', ''))) / (LEN(parent.nod2) + 2) > 1 THEN 1 ELSE 0 END
        ) AS l
        WHERE   child.rn = 1
            AND f.rn = 1
            AND l.stop = 0
    )
    SELECT  cte.link, cte.loop
    FROM    cte
    WHERE   (cte.flag = 1 OR cte.loop = 1)
        AND cte.nodes > 2
    ORDER BY cte.nod1
    OPTION (MAXRECURSION 0);
    
    干杯


    更新:根据@MAK的请求,我更新了我的答案,以获得包含2个以上节点的路径。

    关于此问题,有两个问题需要解决:

  • 删除尚未结束的路径
  • Detect循环(它无限地导致递归cte循环)
  • 下面是我自己的答案:

    IF  OBJECT_ID('tempdb..#tbl_nodes') IS NOT NULL
        DROP TABLE  #tbl_nodes;
    CREATE TABLE    #tbl_nodes (
                        nod1    VARCHAR(50)
                    ,   nod2    VARCHAR(50)
                    );
    
    CREATE NONCLUSTERED INDEX #IX_tbl_nodes_1 ON #tbl_nodes (nod1, nod2);
    CREATE NONCLUSTERED INDEX #IX_tbl_nodes_2 ON #tbl_nodes (nod2, nod1);
    
    INSERT INTO #tbl_nodes (nod1, nod2)
    VALUES  ('Node1','Node2')
        ,   ('Node2','Node4')
        ,   ('Node2','Node3')
        ,   ('Node2','Node5')
        ,   ('Node3','Node5')
        ,   ('Node3','Node6')
        ,   ('Node6','Node7')
        ,   ('Node10','Node11')
        ,   ('Node6','Node8')
        ,   ('Node18','Node19')
        ,   ('Node9','Node10')
        ,   ('Node12','Node13')
        ,   ('Node15','Node16')
        ,   ('Node8', 'Node3')
        ;
    
    
    WITH cte AS (
        SELECT  parent.nod1, parent.nod2
            ,   [link] = CAST('[' + parent.nod1 + '] -> [' + parent.nod2 + ']' AS VARCHAR(MAX))
            ,   [flag] = f.flag
            ,   [loop] = 0
            ,   [stop] = 0
            ,   [nodes] = 2
        FROM        #tbl_nodes AS parent
        LEFT JOIN   #tbl_nodes AS child
                ON  parent.nod1 = child.nod2
        CROSS APPLY (
            SELECT  _f.flag, [rn] = ROW_NUMBER() OVER(ORDER BY _f.flag ASC)
            FROM    (
                SELECT  [flag] = CAST(1 AS BIT)
                UNION ALL
                SELECT  [flag] = CAST(0 AS BIT)
                FROM    #tbl_nodes AS __f
                WHERE   parent.nod2 = __f.nod1
            ) AS _f
        ) AS f
        WHERE   child.nod2 IS NULL
            AND f.rn = 1
        UNION ALL
        SELECT  parent.nod1, child.nod2
            ,   [link] = CAST(parent.link + ' -> [' + child.nod2 + ']' AS VARCHAR(MAX))
            ,   [flag] = f.flag
            ,   [loop] = l.loop
            ,   [stop] = l.stop
            ,   [nodes] = parent.nodes + 1
        FROM        cte AS parent
        CROSS APPLY (
            SELECT  _child.nod1, _child.nod2, [rn] = ROW_NUMBER() OVER(PARTITION BY _child.nod2 ORDER BY _child.nod2)
            FROM    #tbl_nodes AS _child
            WHERE   parent.nod2 = _child.nod1
        ) AS child
        CROSS APPLY (
            SELECT  _f.flag, [rn] = ROW_NUMBER() OVER(ORDER BY _f.flag ASC)
            FROM    (
                SELECT  [flag] = CAST(1 AS BIT)
                UNION ALL
                SELECT  [flag] = CAST(0 AS BIT)
                FROM    #tbl_nodes AS __f
                WHERE   child.nod2 = __f.nod1
            ) AS _f
        ) AS f
        CROSS APPLY (
            SELECT  [loop] = CASE WHEN (LEN(parent.link + ' -> [' + child.nod2 + ']') - LEN(REPLACE(parent.link + ' -> [' + child.nod2 + ']', '[' + child.nod2 + ']', ''))) / (LEN(child.nod2) + 2) > 1 THEN 1 ELSE 0 END
                ,   [stop] = CASE WHEN (LEN(parent.link) - LEN(REPLACE(parent.link, '[' + parent.nod2 + ']', ''))) / (LEN(parent.nod2) + 2) > 1 THEN 1 ELSE 0 END
        ) AS l
        WHERE   child.rn = 1
            AND f.rn = 1
            AND l.stop = 0
    )
    SELECT  cte.link, cte.loop
    FROM    cte
    WHERE   (cte.flag = 1 OR cte.loop = 1)
        AND cte.nodes > 2
    ORDER BY cte.nod1
    OPTION (MAXRECURSION 0);
    
    干杯


    更新:根据@MAK的请求,我更新了我的答案,以获得包含2个以上节点的路径。

    当您没有得到结果时,是否还会收到错误消息?常用表表达式的递归深度限制默认为100。您可以通过在最外层的select语句中添加一个“OPTION(MAXRECURSION…”提示来改变这一点。e、 g:在没有图形表的SQL Server上,这是一项相当困难的任务。如果您想要一个替代解决方案,请尝试以下解决方案:您是否收到有关最大递归限制的错误,或者查询速度太慢?@SalmanA,在8分钟后收到此错误
    Msg 1105,级别17,状态2,第1行无法为数据库“tempdb”中的对象“dbo.Large object存储系统对象:”分配空间,因为“PRIMARY”文件组已满。通过删除不需要的文件、删除文件组中的对象、向文件组添加其他文件或为文件组中的现有文件启用自动增长来创建磁盘空间。Msg 9002,17级,状态4,第1行数据库“tempdb”的事务日志已满。要了解日志中的空间无法重用的原因,请参阅sys.databases中的log_reuse_wait_desc列。我想你的空间已经用完了。当你没有结果时,你是否也会收到一条错误消息?常用表表达式的递归深度限制默认为100。您可以通过在最外层的select语句中添加一个“OPTION(MAXRECURSION…”提示来改变这一点。e、 g:在没有图形表的SQL Server上,这是一项相当困难的任务。如果您想要一个替代解决方案,请尝试以下解决方案:您是否收到有关最大递归限制的错误,或者查询速度太慢?@SalmanA,在8分钟后收到此错误
    Msg 1105,级别17,状态2,第1行无法为数据库“tempdb”中的对象“dbo.Large object存储系统对象:”分配空间,因为“PRIMARY”文件组已满。通过删除不需要的文件、删除文件组中的对象、向文件组添加其他文件或为文件组中的现有文件启用自动增长来创建磁盘空间。Msg 9002,17级,状态4,第1行数据库“tempdb”的事务日志已满。要了解日志中的空间无法重用的原因,请参阅sys.databases中的log_reuse_wait_desc列。我想您的空间已用完。语句终止时出现错误
    。在语句完成之前,最大递归100已用尽。
    假设图中没有循环(Node1>Node2>Node3>Node1),然后在末尾添加
    选项(MAXRECURSION 500)
    。实际上,您可以使用更大的值(甚至0),但我建议尝试使用更小的值,直到它起作用为止。他们在上一个问题中确实有循环引用,它引用了表
    tbl_节点
    ,@SalmanA。这里的示例没有,但不幸的是,这并不意味着它们的生产数据没有。获取错误
    语句终止。在语句完成之前,最大递归100已用尽。
    假设图中没有循环(Node1>Node2>Node3>Node1),然后在末尾添加
    选项(MAXRECURSION 500)
    。实际上,你可以使用一个更大的值(甚至0),但我建议尝试使用更小的值,直到它起作用为止