在多对多关系、SQL图形连接组件中对所有相关记录进行分组

在多对多关系、SQL图形连接组件中对所有相关记录进行分组,sql,sql-server,sql-server-2012,Sql,Sql Server,Sql Server 2012,希望我错过了一个简单的解决方案 我有两张桌子。其中一个包含公司列表。第二个包含出版商列表。两者之间的映射是多对多的。我想做的是捆绑或分组表A中与表B中的出版商有任何关系的所有公司,反之亦然 最终的结果类似于GROUPID是关键字段。第1行和第2行属于同一组,因为它们共享同一家公司。第3行位于同一组中,因为发布者Y已映射到公司A。第4行位于组中,因为公司B已通过发布者Y映射到组1 简单地说,只要公司和出版商之间存在任何形式的共享关系,就应该将这一对分配给同一个组 ROW GROUPID

希望我错过了一个简单的解决方案

我有两张桌子。其中一个包含公司列表。第二个包含出版商列表。两者之间的映射是多对多的。我想做的是捆绑或分组表A中与表B中的出版商有任何关系的所有公司,反之亦然

最终的结果类似于GROUPID是关键字段。第1行和第2行属于同一组,因为它们共享同一家公司。第3行位于同一组中,因为发布者Y已映射到公司A。第4行位于组中,因为公司B已通过发布者Y映射到组1

简单地说,只要公司和出版商之间存在任何形式的共享关系,就应该将这一对分配给同一个组

ROW   GROUPID     Company     Publisher
1     1           A           Y
2     1           A           X
3     1           B           Y
4     1           B           Z
5     2           C           W
6     2           C           P
7     2           D           W
更新: 我的赏金版本:给定上面小提琴中简单公司和出版商对的表格,填充上面的GROUPID字段。可以将其视为创建一个包含所有相关父母/子女的家庭ID


SQL Server 2012

您正在尝试查找图形中所有连接的组件,这只能以迭代方式完成。如果您知道任何连接组件的最大宽度,即从一家公司/出版商到另一家公司/出版商的链接的最大数量,原则上您可以这样做:

SELECT
    MIN(x2.groupID) AS groupID,
    x1.Company,
    x1.Publisher
FROM Table1 AS x1
    INNER JOIN (
        SELECT
            MIN(x2.Company) AS groupID,
            x1.Company,
            x1.Publisher
        FROM Table1 AS x1
            INNER JOIN Table1 AS x2
            ON x1.Publisher = x2.Publisher
        GROUP BY
            x1.Publisher,
            x1.Company
    ) AS x2
    ON x1.Company = x2.Company
GROUP BY
    x1.Publisher,
    x1.Company;
您必须在Company和Publisher上交替嵌套子查询连接,最深的子查询是MINCompany而不是MINgroupID,以达到最大迭代深度

不过,我并不真的建议这样做;在SQL之外这样做会更干净

免责声明:我对SQL Server 2012或任何其他版本一无所知;它可能有一些额外的脚本功能,可以让您动态地执行此迭代。

这是我的解决方案

正如我所想,关系的本质要求循环

以下是SQL:

--drop TABLE Table1

CREATE TABLE Table1
    ([row] int identity (1,1),GroupID INT NULL,[Company] varchar(2), [Publisher] varchar(2))
;

INSERT INTO Table1
    (Company, Publisher)
select
    left(newid(), 2), left(newid(), 2)

declare @i int = 1

while @i < 8
begin
    ;with cte(Company, Publisher) as (
        select
            left(newid(), 2), left(newid(), 2)
        from Table1
    )
    insert into Table1(Company, Publisher)
    select distinct c.Company, c.Publisher
    from cte as c
    where not exists (select * from Table1 as t where t.Company = c.Company and t.Publisher = c.Publisher)

    set @i = @i + 1
end;


CREATE NONCLUSTERED INDEX IX_Temp1 on Table1 (Company)
CREATE NONCLUSTERED INDEX IX_Temp2 on Table1 (Publisher)

declare @counter int=0
declare @row int=0
declare @lastnullcount int=0
declare @currentnullcount int=0

WHILE EXISTS (
  SELECT *
  FROM Table1
  where GroupID is null
  )
BEGIN
    SET @counter=@counter+1
    SET @lastnullcount =0

    SELECT TOP 1
        @row=[row]
    FROM Table1
    where GroupID is null
    order by [row] asc

    SELECT @currentnullcount=count(*) from table1 where groupid is null
    WHILE @lastnullcount <> @currentnullcount
    BEGIN
        SELECT @lastnullcount=count(*)
        from table1
        where groupid is null 

        UPDATE Table1
        SET GroupID=@counter
        WHERE [row]=@row

        UPDATE t2
        SET t2.GroupID=@counter
        FROM Table1 t1
        INNER JOIN Table1 t2 on t1.Company=t2.Company
        WHERE t1.GroupID=@counter
        AND t2.GroupID IS NULL

        UPDATE t2
        SET t2.GroupID=@counter
        FROM Table1 t1
        INNER JOIN Table1 t2 on t1.publisher=t2.publisher
        WHERE t1.GroupID=@counter
        AND t2.GroupID IS NULL

        SELECT @currentnullcount=count(*)
        from table1
        where groupid is null
    END
END

SELECT * FROM Table1
编辑: 在实际表中添加了我期望的索引,并且与Roman正在使用的其他数据集更加一致。

我考虑过使用,但是,据我所知,在SQL Server中不可能使用UNION连接递归CTE的锚定成员和递归成员,我认为在PostgreSQL中可以这样做,所以不可能消除重复项

declare @i int

with cte as (
     select
         GroupID,
         row_number() over(order by Company) as rn
     from Table1
)
update cte set GroupID = rn

select @i = @@rowcount

-- while some rows updated
while @i > 0
begin
    update T1 set
        GroupID = T2.GroupID
    from Table1 as T1
        inner join (
            select T2.Company, min(T2.GroupID) as GroupID
            from Table1 as T2
            group by T2.Company
        ) as T2 on T2.Company = T1.Company
    where T1.GroupID > T2.GroupID

    select @i = @@rowcount

    update T1 set
        GroupID = T2.GroupID
    from Table1 as T1
        inner join (
            select T2.Publisher, min(T2.GroupID) as GroupID
            from Table1 as T2
            group by T2.Publisher
        ) as T2 on T2.Publisher = T1.Publisher
    where T1.GroupID > T2.GroupID

    -- will be > 0 if any rows updated
    select @i = @i + @@rowcount
end

;with cte as (
     select
         GroupID,
         dense_rank() over(order by GroupID) as rn
     from Table1
)
update cte set GroupID = rn
我还尝试了广度优先搜索算法。我认为它可以更快,在复杂性方面更好,所以我将在这里提供一个解决方案。我发现它并不比SQL方法快,不过:

declare @Company nvarchar(2), @Publisher nvarchar(2), @GroupID int

declare @Queue table (
    Company nvarchar(2), Publisher nvarchar(2), ID int identity(1, 1),
    primary key(Company, Publisher)
)

select @GroupID = 0

while 1 = 1
begin
    select top 1 @Company = Company, @Publisher = Publisher
    from Table1
    where GroupID is null

    if @@rowcount = 0 break

    select @GroupID = @GroupID + 1

    insert into @Queue(Company, Publisher)
    select @Company, @Publisher

    while 1 = 1
    begin
        select top 1 @Company = Company, @Publisher = Publisher
        from @Queue
        order by ID asc

        if @@rowcount = 0 break

        update Table1 set
            GroupID = @GroupID
        where Company = @Company and Publisher = @Publisher

        delete from @Queue where Company = @Company and Publisher = @Publisher

        ;with cte as (
            select Company, Publisher from Table1 where Company = @Company and GroupID is null
            union all
            select Company, Publisher from Table1 where Publisher = @Publisher and GroupID is null
        )
        insert into @Queue(Company, Publisher)
        select distinct c.Company, c.Publisher
        from cte as c
        where not exists (select * from @Queue as q where q.Company = c.Company and q.Publisher = c.Publisher)
   end
end
我已经测试了我的版本和Gordon Linoff的版本,以检查它的性能。看起来CTE要糟糕得多,我等不及了,因为它已经在1000多行上完成了

这里是随机数据的例子。我的结果是: 128行: 我的RBAR解决方案:190ms 我的SQL解决方案:27ms Gordon Linoff的解决方案:958ms 256行: 我的RBAR解决方案:560ms 我的SQL解决方案:1226ms Gordon Linoff的解决方案:45371ms

这是随机数据,所以结果可能不太一致。我认为时间可以通过索引来改变,但不认为它可以改变整个画面

旧版本-使用临时表,只计算GroupID而不接触初始表:

declare @i int

-- creating table to gather all possible GroupID for each row
create table #Temp
(
    Company varchar(1), Publisher varchar(1), GroupID varchar(1),
    primary key (Company, Publisher, GroupID)
)

-- initializing it with data
insert into #Temp (Company, Publisher, GroupID)
select Company, Publisher, Company
from Table1

select @i = @@rowcount

-- while some rows inserted into #Temp
while @i > 0
begin
    -- expand #Temp in both directions
    ;with cte as (
        select
            T2.Company, T1.Publisher,
            T1.GroupID as GroupID1, T2.GroupID as GroupID2
        from #Temp as T1
            inner join #Temp as T2 on T2.Company = T1.Company
        union
        select
            T1.Company, T2.Publisher,
            T1.GroupID as GroupID1, T2.GroupID as GroupID2
        from #Temp as T1
            inner join #Temp as T2 on T2.Publisher = T1.Publisher        
    ), cte2 as (
        select
            Company, Publisher,
            case when GroupID1 < GroupID2 then GroupID1 else GroupID2 end as GroupID
        from cte
    )
    insert into #Temp
    select Company, Publisher, GroupID
    from cte2
    -- don't insert duplicates
    except
    select Company, Publisher, GroupID
    from #Temp

    -- will be > 0 if any row inserted
    select @i = @@rowcount
end

select
    Company, Publisher,
    dense_rank() over(order by min(GroupID)) as GroupID
from #Temp
group by Company, Publisher

这是一个使用XML的递归解决方案:

with a as ( -- recursive result, containing shorter subsets and duplicates
    select cast('<c>' + company + '</c>' as xml) as companies
          ,cast('<p>' + publisher + '</p>' as xml) as publishers
      from Table1

    union all

    select a.companies.query('for $c in distinct-values((for $i in /c return string($i),
                                                        sql:column("t.company")))
                          order by $c
                          return <c>{$c}</c>')
          ,a.publishers.query('for $p in distinct-values((for $i in /p return string($i),
                                                         sql:column("t.publisher")))
                          order by $p
                          return <p>{$p}</p>')
    from a join Table1 t
      on (   a.companies.exist('/c[text() = sql:column("t.company")]') = 0 
          or a.publishers.exist('/p[text() = sql:column("t.publisher")]') = 0)
     and (   a.companies.exist('/c[text() = sql:column("t.company")]') = 1
          or a.publishers.exist('/p[text() = sql:column("t.publisher")]') = 1)
), b as ( -- remove the shorter versions from earlier steps of the recursion and the duplicates
    select distinct -- distinct cannot work on xml types, hence cast to nvarchar
           cast(companies as nvarchar) as companies
          ,cast(publishers as nvarchar) as publishers
          ,DENSE_RANK() over(order by cast(companies as nvarchar), cast(publishers as nvarchar)) as groupid
     from a
    where not exists (select 1 from a as s -- s is a proper subset of a
                       where (cast('<s>' + cast(s.companies as varchar)
                                 + '</s><a>' + cast(a.companies as varchar) + '</a>' as xml)
                             ).value('if((count(/s/c) > count(/a/c))
                                         and (some $s in /s/c/text() satisfies
                                             (some $a in /a/c/text() satisfies $s = $a))
                                      ) then 1 else 0', 'int') = 1
                     )
      and not exists (select 1 from a as s -- s is a proper subset of a
                       where (cast('<s>' + cast(s.publishers as nvarchar)
                                 + '</s><a>' + cast(a.publishers as nvarchar) + '</a>' as xml)
                             ).value('if((count(/s/p) > count(/a/p))
                                         and (some $s in /s/p/text() satisfies
                                             (some $a in /a/p/text() satisfies $s = $a))
                                      ) then 1 else 0', 'int') = 1
                     )
), c as (  -- cast back to xml
    select cast(companies as xml) as companies
          ,cast(publishers as xml) as publishers
          ,groupid
      from b
)
select Co.company.value('(./text())[1]', 'varchar') as company
      ,Pu.publisher.value('(./text())[1]', 'varchar') as publisher
      ,c.groupid
  from c
       cross apply companies.nodes('/c') as Co(company)
       cross apply publishers.nodes('/p') as Pu(publisher)
 where exists(select 1 from Table1 t -- restrict to only the combinations that exist in the source
               where t.company = Co.company.value('(./text())[1]', 'varchar')
                 and t.publisher = Pu.publisher.value('(./text())[1]', 'varchar')
             )

在中间步骤中,公司集和发布者集保存在XML字段中,由于SQL Server的某些限制,如无法在XML列上分组或使用distinct,因此需要在XML和nvarchar之间进行转换。

您的问题是查找连接子图的图漫游问题。这更具挑战性,因为您的数据结构有两种类型的节点—公司和发布者,而不是一种类型

您可以用一个递归CTE来解决这个问题。逻辑如下

首先,将问题转化为只有一种节点类型的图。我通过使用发布者信息使节点和边在公司之间链接来实现这一点。这只是一个连接:

      select t1.company as node1, t2.company as node2
      from table1 t1 join
           table1 t2
           on t1.publisher = t2.publisher
     )
为了提高效率,您还可以添加t1.company t2.company,但这并不是绝对必要的

现在,这是一个简单的图遍历问题,其中递归CTE用于创建两个节点之间的所有连接。递归CTE使用join遍历图形。在这个过程中,它会保存一个访问过的所有节点的列表。在SQL Server中,这需要存储在字符串中

代码需要确保在给定路径中不会两次访问节点,因为这可能导致无限递归和错误。如果以上称为边,则生成所有连接节点对的CTE如下所示:

     cte as (
      select e.node1, e.node2, cast('|'+e.node1+'|'+e.node2+'|' as varchar(max)) as nodes,
             1 as level
      from edges e
      union all
      select c.node1, e.node2, c.nodes+e.node2+'|', 1+c.level
      from cte c join
           edges e
           on c.node2 = e.node1 and
              c.nodes not like '|%'+e.node2+'%|'
     )
with edges as (
      select t1.company as node1, t2.company as node2
      from table1 t1 join
           table1 t2
           on t1.publisher = t2.publisher
     ),
     cte as (
      select e.node1, e.node2,
             cast('|'+e.node1+'|'+e.node2+'|' as varchar(max)) as nodes,
             1 as level
      from edges e
      union all
      select c.node1, e.node2,
             c.nodes+e.node2+'|',
             1+c.level
      from cte c join
           edges e
           on c.node2 = e.node1 and
              c.nodes not like '|%'+e.node2+'%|'
     ),
     nodes as (
       select node1,
              (case when min(node2) < node1 then min(node2) else node1 end
              ) as grp
       from cte
       group by node1
      )
select t.company, t.publisher, grp.GroupId
from table1 t join
     (select n.node1, dense_rank() over (order by grp) as GroupId
      from nodes n
     ) grp
     on t.company = grp.node1;
现在,使用此已连接节点列表,为每个节点指定其连接到的所有节点(包括其自身)中的最小值。这用作连接子图的标识符。也就是说,所有公司通过 出版商将有相同的最低要求

最后两个步骤是将此最小值枚举为GroupId,并将GroupId连接回原始数据

完整和我可能添加的测试查询如下所示:

     cte as (
      select e.node1, e.node2, cast('|'+e.node1+'|'+e.node2+'|' as varchar(max)) as nodes,
             1 as level
      from edges e
      union all
      select c.node1, e.node2, c.nodes+e.node2+'|', 1+c.level
      from cte c join
           edges e
           on c.node2 = e.node1 and
              c.nodes not like '|%'+e.node2+'%|'
     )
with edges as (
      select t1.company as node1, t2.company as node2
      from table1 t1 join
           table1 t2
           on t1.publisher = t2.publisher
     ),
     cte as (
      select e.node1, e.node2,
             cast('|'+e.node1+'|'+e.node2+'|' as varchar(max)) as nodes,
             1 as level
      from edges e
      union all
      select c.node1, e.node2,
             c.nodes+e.node2+'|',
             1+c.level
      from cte c join
           edges e
           on c.node2 = e.node1 and
              c.nodes not like '|%'+e.node2+'%|'
     ),
     nodes as (
       select node1,
              (case when min(node2) < node1 then min(node2) else node1 end
              ) as grp
       from cte
       group by node1
      )
select t.company, t.publisher, grp.GroupId
from table1 t join
     (select n.node1, dense_rank() over (order by grp) as GroupId
      from nodes n
     ) grp
     on t.company = grp.node1;
请注意,这适用于查找任何连通子图。它不假设任何特定数量的级别

编辑:

这方面的表现问题令人烦恼。至少,使用Publisher上的索引可以更好地运行上述查询。更好的办法是采纳@MikaelEriksson的建议,将边缘放在单独的表格中

另一个问题是,您是否在公司或出版商之间寻找等价类。我采取了使用公司的方法,因为我认为这有更好的解释性。我的回应倾向是基于大量的评论,认为CTE无法做到这一点


我猜您可以从中获得合理的性能,尽管这需要比OP中提供的更多的数据和系统知识。但是,最好的性能很可能来自于多查询方法。

有点晚了,由于SQLFiddle似乎已经停机,我不得不猜测您的数据结构。尽管如此,这似乎是一个有趣的挑战,这就是我从中得到的:

设置:

IF OBJECT_ID('t_link') IS NOT NULL DROP TABLE t_link
IF OBJECT_ID('t_company') IS NOT NULL DROP TABLE t_company
IF OBJECT_ID('t_publisher') IS NOT NULL DROP TABLE t_publisher
IF OBJECT_ID('tempdb..#link_A') IS NOT NULL DROP TABLE #link_A
IF OBJECT_ID('tempdb..#link_B') IS NOT NULL DROP TABLE #link_B
GO

CREATE TABLE t_company ( company_id     int IDENTITY(1, 1) NOT NULL PRIMARY KEY,
                         company_name   varchar(100) NOT NULL)

GO 

CREATE TABLE t_publisher (publisher_id     int IDENTITY(1, 1) NOT NULL PRIMARY KEY,
                          publisher_name   varchar(100) NOT NULL)

CREATE TABLE t_link (company_id int NOT NULL FOREIGN KEY (company_id) REFERENCES t_company (company_id),
                     publisher_id int NOT NULL FOREIGN KEY (publisher_id) REFERENCES t_publisher (publisher_id),
                                PRIMARY KEY (company_id, publisher_id),
                     group_id int NULL
                             )
GO

-- example content


-- ROW   GROUPID     Company     Publisher
--1     1           A           Y
--2     1           A           X
--3     1           B           Y
--4     1           B           Z
--5     2           C           W
--6     2           C           P
--7     2           D           W


INSERT t_company (company_name) VALUES ('A'), ('B'), ('C'), ('D')
INSERT t_publisher (publisher_name) VALUES ('X'), ('Y'), ('Z'), ('W'), ('P')

INSERT t_link (company_id, publisher_id)
SELECT company_id, publisher_id
  FROM t_company, t_publisher
 WHERE (company_name = 'A' AND publisher_name = 'Y')
    OR (company_name = 'A' AND publisher_name = 'X')
    OR (company_name = 'B' AND publisher_name = 'Y')
    OR (company_name = 'B' AND publisher_name = 'Z')
    OR (company_name = 'C' AND publisher_name = 'W')
    OR (company_name = 'C' AND publisher_name = 'P')
    OR (company_name = 'D' AND publisher_name = 'W')




GO

/*
-- volume testing

TRUNCATE TABLE t_link
DELETE t_company
DELETE t_publisher


DECLARE @company_count   int = 1000,
        @publisher_count int = 450,
        @links_count     int = 800


INSERT t_company (company_name)
SELECT company_name    = Convert(varchar(100), NewID())
  FROM master.dbo.fn_int_list(1, @company_count) 

UPDATE STATISTICS t_company

INSERT t_publisher (publisher_name)
SELECT publisher_name  = Convert(varchar(100), NewID())
  FROM master.dbo.fn_int_list(1, @publisher_count) 

UPDATE STATISTICS t_publisher

-- Random links between the companies & publishers

DECLARE @count int
SELECT @count = 0

WHILE @count < @links_count
    BEGIN

        SELECT TOP 30 PERCENT row_id = IDENTITY(int, 1, 1), company_id = company_id + 0
          INTO #link_A
          FROM t_company
         ORDER BY NewID()

        SELECT TOP 30 PERCENT row_id = IDENTITY(int, 1, 1), publisher_id = publisher_id + 0
          INTO #link_B
          FROM t_publisher
         ORDER BY NewID()

        INSERT TOP (@links_count - @count) t_link (company_id, publisher_id)
        SELECT A.company_id,
               B.publisher_id
          FROM #link_A A
          JOIN #link_B B
            ON A.row_id = B.row_id
         WHERE NOT EXISTS ( SELECT *
                              FROM t_link old
                             WHERE old.company_id   = A.company_id
                               AND old.publisher_id = B.publisher_id)

        SELECT @count = @count + @@ROWCOUNT

        DROP TABLE #link_A
        DROP TABLE #link_B    
    END

*/
实际分组:

IF OBJECT_ID('tempdb..#links') IS NOT NULL DROP TABLE #links
GO

-- apply grouping

-- init
SELECT row_id = IDENTITY(int, 1, 1), 
       company_id,
       publisher_id,
       group_id = 0
  INTO #links
  FROM t_link

-- don't see an index that would be actually helpful here right-away, using row_id to avoid HEAP
CREATE CLUSTERED INDEX idx0 ON #links (row_id)
--CREATE INDEX idx1 ON #links (company_id)   
--CREATE INDEX idx2 ON #links (publisher_id)

UPDATE #links
   SET group_id = row_id


-- start grouping
WHILE @@ROWCOUNT > 0
    BEGIN  
        UPDATE #links
           SET group_id = new_group_id
          FROM #links upd
          CROSS APPLY (SELECT new_group_id = Min(group_id)
                         FROM #links new
                        WHERE new.company_id   = upd.company_id
                           OR new.publisher_id = upd.publisher_id 
                                     ) x
        WHERE upd.group_id > new_group_id

        -- select * from #links
    END


-- remove 'holes'
UPDATE #links
   SET group_id = (SELECT COUNT(DISTINCT o.group_id) 
                          FROM #links o
                         WHERE o.group_id <= upd.group_id)
  FROM #links upd

GO

UPDATE t_link
   SET group_id = new.group_id
  FROM t_link upd
  LEFT OUTER JOIN #links new
               ON new.company_id = upd.company_id
              AND new.publisher_id = upd.publisher_id

GO    
SELECT row = ROW_NUMBER() OVER (ORDER BY group_id, company_name, publisher_name),
       l.group_id,
       c.company_name, -- c.company_id,
       p.publisher_name -- , p.publisher_id
 from t_link l
 JOIN t_company c
   ON l.company_id = c.company_id
 JOIN t_publisher p 
   ON p.publisher_id = l.publisher_id
 ORDER BY 1
乍一看,这种方法还没有被其他任何人尝试过,有趣的是,看看如何以各种方式做到这一点。。。最好不要预先阅读,因为这会破坏谜题=



就我所理解的需求而言,结果看起来与预期的一样,示例和性能也不太差,尽管没有实际的迹象表明应该处理多少记录;不确定它将如何扩展,但也不要期望有太多问题

您使用的是哪一版本的SQL Server?我将撤消您刚才所做的编辑,以更简单的方式进行编辑,并使用“代码示例”按钮。@GoatCO我已经提交了一个rollback.oops,谢谢。下次我会修正:@SpectralGhost是的,这不是我的问题,我只是利用它来获得一些想法。通常在孩子/家长问题中,使用递归cte来连接连续的对,但我从未见过一个将两个方向的结果分组为“家庭”的示例。正如你在我的回答中所看到的,递归解决方案是可能的。我也考虑过使用CTE,但SQL Server在递归深度方面存在限制;不确定OP是否会遇到。可以使用选项maxrecursion 0。主要的问题是如何消除重复项MAXRECURSION 0是我的朋友,这一部分不会困扰我。@RomanPekar。我不想卷入性能大战。但您的表结构(主键位于company和publisher上)针对您的方法进行了优化。在SQLFiddle上测试时,我看到了一些不同的计时方式,即输入一个identity主键,因此在这些字段上没有索引。但是把时间安排在一起真是太好了。因为你是做性能测试的人,所以我在这里发表我的评论。@GordonLinoff的解决方案受到了影响,因为每次调用CTE的请求部分都会执行边缘CTE。最好将边缘CTE存储在一个适当索引的临时表中,并将其用于请求CTE I。不知道有多少。@GordonLinoff用一些性能测试来检查我的更新答案,我认为如果组很长的话,这是不可能处理超过10000行的。但今天没有时间测试。感谢您的全面回答,如果可能的话,我们将平分赏金,但由于罗曼在性能测试方面的努力,我们还是同意他的答案。@GoatCO除非问题结束,否则可能会出现新的答案。上下票数也会改变州。然后你的评论可能会失去它的真实价值。当提及其他答案时,您可以使用指向“共享”URL的链接,并避免使用“其他”、“上面”和“下面”等术语。用sql术语来说,这就像记录“生日”而不是“年龄”;您指的是一些不可变的而不是变量。@GoatCO仅供参考,经过性能测试,我的解决方案在本地SQL server实例上的性能明显优于我在本线程中看到的任何CTE示例。让Roman把它加入到他的表演中。我已经通过一组例子,从1000张记录到100万张记录,找到了所有答案,但我会重新访问它。我现在时间不多了,下面是对两个解决方案的快速检查-@RomanPekar真正奇怪的是,我在本地SQL server实例中运行它得到了完全不同的结果。按比例放大行,您就可以真正开始看到差异。不知道为什么SQL fiddle如此不同。