Sql 识别/比较组内的行集

Sql 识别/比较组内的行集,sql,sql-server,tsql,Sql,Sql Server,Tsql,我有一件事似乎很容易解决,但现在我觉得很麻烦 在简化中,我需要找到一种方法来识别由另一列定义的组中唯一的行集。在基本示例中,源表仅包含两列: routeID nodeID nodeName 1 1 a 1 2 b 2 1 a 2 2 b 3 1 a 3 2 b 4 1 a 4 2

我有一件事似乎很容易解决,但现在我觉得很麻烦

在简化中,我需要找到一种方法来识别由另一列定义的组中唯一的行集。在基本示例中,源表仅包含两列:

routeID nodeID nodeName 
   1       1      a
   1       2      b
   2       1      a
   2       2      b
   3       1      a
   3       2      b
   4       1      a
   4       2      c
   5       1      a
   5       2      c
   6       1      a
   6       2      b
   6       3      d
   7       1      a
   7       2      b
   7       3      d 
因此,routeID列是指定义路由的一组节点

我需要做的是以某种方式对路由进行分组,以便一个routeID只有一个唯一的节点序列

在我的实际案例中,我尝试使用window函数来添加有助于识别节点序列的列,但我仍然不知道如何获得这些唯一序列和组路由

作为最后一个效果,我只希望得到唯一的路由-例如,路由1、2和3聚合到一个路由

你知道怎么帮我吗

编辑:

我想与示例中的表连接的另一个表可能如下所示:

journeyID nodeID nodeName routeID
    1        1       a       1
    1        2       b       1
    2        1       a       1
    2        2       b       1
    3        1       a       4
    3        2       c       4 
    ...........................
    ...........................
你可以试试这个主意:

DECLARE @DataSource TABLE
(
    [routeID] TINYINT 
   ,[nodeID] TINYINT
   ,[nodeName] CHAR(1)
);

INSERT INTO @DataSource ([routeID], [nodeID], [nodeName])
VALUES   ('1', '1', 'a')
        ,('1', '2', 'b')
        ,('2', '1', 'a')
        ,('2', '2', 'b')
        ,('3', '1', 'a')
        ,('3', '2', 'b')
        ,('4', '1', 'a')
        ,('4', '2', 'c')
        ,('5', '1', 'a')
        ,('5', '2', 'c')
        ,('6', '1', 'a')
        ,('6', '2', 'b')
        ,('6', '3', 'd')
        ,('7', '1', 'a')
        ,('7', '2', 'b')
        ,('7', '3', 'd');


SELECT DS.[routeID]
      ,nodes.[value]
      ,ROW_NUMBER() OVER (PARTITION BY nodes.[value] ORDER BY [routeID]) AS [rowID]
FROM 
(   
    -- getting unique route ids
    SELECT DISTINCT [routeID]
    FROM @DataSource DS
) DS ([routeID])
CROSS APPLY
(
    -- for each route id creating CSV list with its node ids
    SELECT STUFF
    (
        (
            SELECT ',' + [nodeName]
            FROM @DataSource DSI
            WHERE DSI.[routeID] = DS.[routeID]
            ORDER BY [nodeID]
            FOR XML PATH(''), TYPE
        ).value('.', 'VARCHAR(MAX)')
        ,1
        ,1
        ,''
    )
) nodes ([value]);
代码将为您提供以下输出:

因此,您只需按
rowID=1
进行筛选。当然,您可以根据需要更改代码,以满足业务标准(例如,不显示具有相同节点的第一个路由ID,但显示最后一个)

另外,
ROW\u NUMBER
函数不能直接在
WHERE
子句中使用,因此需要在过滤之前包装代码:

WITH DataSource AS
(
    SELECT DS.[routeID]
          ,nodes.[value]
          ,ROW_NUMBER() OVER (PARTITION BY nodes.[value] ORDER BY [routeID]) AS [rowID]
    FROM 
    (   
        -- getting unique route ids
        SELECT DISTINCT [routeID]
        FROM @DataSource DS
    ) DS ([routeID])
    CROSS APPLY
    (
        -- for each route id creating CSV list with its node ids
        SELECT STUFF
        (
            (
                SELECT ',' + [nodeName]
                FROM @DataSource DSI
                WHERE DSI.[routeID] = DS.[routeID]
                ORDER BY [nodeID]
                FOR XML PATH(''), TYPE
            ).value('.', 'VARCHAR(MAX)')
            ,1
            ,1
            ,''
        )
    ) nodes ([value])
)
SELECT DS2.*
FROM DataSource DS1
INNER JOIN @DataSource DS2
    ON DS1.[routeID] = DS2.[routeID]
WHERE DS1.[rowID] = 1;

你可以试试这个主意:

DECLARE @DataSource TABLE
(
    [routeID] TINYINT 
   ,[nodeID] TINYINT
   ,[nodeName] CHAR(1)
);

INSERT INTO @DataSource ([routeID], [nodeID], [nodeName])
VALUES   ('1', '1', 'a')
        ,('1', '2', 'b')
        ,('2', '1', 'a')
        ,('2', '2', 'b')
        ,('3', '1', 'a')
        ,('3', '2', 'b')
        ,('4', '1', 'a')
        ,('4', '2', 'c')
        ,('5', '1', 'a')
        ,('5', '2', 'c')
        ,('6', '1', 'a')
        ,('6', '2', 'b')
        ,('6', '3', 'd')
        ,('7', '1', 'a')
        ,('7', '2', 'b')
        ,('7', '3', 'd');


SELECT DS.[routeID]
      ,nodes.[value]
      ,ROW_NUMBER() OVER (PARTITION BY nodes.[value] ORDER BY [routeID]) AS [rowID]
FROM 
(   
    -- getting unique route ids
    SELECT DISTINCT [routeID]
    FROM @DataSource DS
) DS ([routeID])
CROSS APPLY
(
    -- for each route id creating CSV list with its node ids
    SELECT STUFF
    (
        (
            SELECT ',' + [nodeName]
            FROM @DataSource DSI
            WHERE DSI.[routeID] = DS.[routeID]
            ORDER BY [nodeID]
            FOR XML PATH(''), TYPE
        ).value('.', 'VARCHAR(MAX)')
        ,1
        ,1
        ,''
    )
) nodes ([value]);
代码将为您提供以下输出:

因此,您只需按
rowID=1
进行筛选。当然,您可以根据需要更改代码,以满足业务标准(例如,不显示具有相同节点的第一个路由ID,但显示最后一个)

另外,
ROW\u NUMBER
函数不能直接在
WHERE
子句中使用,因此需要在过滤之前包装代码:

WITH DataSource AS
(
    SELECT DS.[routeID]
          ,nodes.[value]
          ,ROW_NUMBER() OVER (PARTITION BY nodes.[value] ORDER BY [routeID]) AS [rowID]
    FROM 
    (   
        -- getting unique route ids
        SELECT DISTINCT [routeID]
        FROM @DataSource DS
    ) DS ([routeID])
    CROSS APPLY
    (
        -- for each route id creating CSV list with its node ids
        SELECT STUFF
        (
            (
                SELECT ',' + [nodeName]
                FROM @DataSource DSI
                WHERE DSI.[routeID] = DS.[routeID]
                ORDER BY [nodeID]
                FOR XML PATH(''), TYPE
            ).value('.', 'VARCHAR(MAX)')
            ,1
            ,1
            ,''
        )
    ) nodes ([value])
)
SELECT DS2.*
FROM DataSource DS1
INNER JOIN @DataSource DS2
    ON DS1.[routeID] = DS2.[routeID]
WHERE DS1.[rowID] = 1;

好的,让我们使用一些递归为每个routeID创建一个完整的节点列表

首先,让我们填充源表和日志

 -- your source  
declare @r as table (routeID int, nodeID int, nodeName char(1))

-- your other table  
declare @j as table (journeyID int, nodeID int, nodeName char(1), routeID int) 

 -- temp results table  
declare @routes as table (routeID int primary key, nodeNames varchar(1000))

;with
s as (
    select *
    from (
        values
        (1,       1,      'a'),
        (1,       2,      'b'),
        (2,       1,      'a'),
        (2,       2,      'b'),
        (3,       1,      'a'),
        (3,       2,      'b'),
        (4,       1,      'a'),
        (4,       2,      'c'),
        (5,       1,      'a'),
        (5,       2,      'c'),
        (6,       1,      'a'),
        (6,       2,      'b'),
        (6,       3,      'd'),
        (7,       1,      'a'),
        (7,       2,      'b'),
        (7,       3,      'd') 
    ) s  (routeID, nodeID, nodeName)
)
insert into @r
select *
from s

;with
s as (
    select *
    from (
        values 
        (1,        1,       'a',       1),
        (1,        2,       'b',       1),
        (2,        1,       'a',       1),
        (2,        2,       'b',       1),
        (3,        1,       'a',       4),
        (3,        2,       'c',       4)
    ) s  (journeyID, routeID, nodeID, nodeName)
)
insert into @j
select *
from s
现在让我们讨论一下路线:

;with
d as (
    select *, row_number() over (partition by r.routeID order by r.nodeID desc) n2
    from @r r
),
r as (
    select d.*, cast(nodeName as varchar(1000)) Names, cast(0 as bigint) i2
    from d
    where nodeId=1
    union all
    select d.*, cast(r.names + ',' + d.nodeName as varchar(1000)), r.n2
    from d
    join r on r.routeID = d.routeID and r.nodeId=d.nodeId-1 
)
insert into @routes
select routeID, Names
from r
where n2=1
表@routes如下所示:

routeID nodeNames
1       'a,b'
2       'a,b'
3       'a,b'
4       'a,c'
5       'a,c'
6       'a,b,d'
7       'a,b,d'
A现在是最终输出:

-- the unique routes 
select MIN(r.routeID) routeID, nodeNames
from @routes r
group by nodeNames

-- the unique journyes
select MIN(journeyID) journeyID, r.nodeNames
from @j j
inner join @routes r on j.routeID = r.routeID
group by nodeNames
输出:

routeID nodeNames
1       'a,b'
4       'a,c'
6       'a,b,d'


好的,让我们使用一些递归为每个routeID创建一个完整的节点列表

首先,让我们填充源表和日志

 -- your source  
declare @r as table (routeID int, nodeID int, nodeName char(1))

-- your other table  
declare @j as table (journeyID int, nodeID int, nodeName char(1), routeID int) 

 -- temp results table  
declare @routes as table (routeID int primary key, nodeNames varchar(1000))

;with
s as (
    select *
    from (
        values
        (1,       1,      'a'),
        (1,       2,      'b'),
        (2,       1,      'a'),
        (2,       2,      'b'),
        (3,       1,      'a'),
        (3,       2,      'b'),
        (4,       1,      'a'),
        (4,       2,      'c'),
        (5,       1,      'a'),
        (5,       2,      'c'),
        (6,       1,      'a'),
        (6,       2,      'b'),
        (6,       3,      'd'),
        (7,       1,      'a'),
        (7,       2,      'b'),
        (7,       3,      'd') 
    ) s  (routeID, nodeID, nodeName)
)
insert into @r
select *
from s

;with
s as (
    select *
    from (
        values 
        (1,        1,       'a',       1),
        (1,        2,       'b',       1),
        (2,        1,       'a',       1),
        (2,        2,       'b',       1),
        (3,        1,       'a',       4),
        (3,        2,       'c',       4)
    ) s  (journeyID, routeID, nodeID, nodeName)
)
insert into @j
select *
from s
现在让我们讨论一下路线:

;with
d as (
    select *, row_number() over (partition by r.routeID order by r.nodeID desc) n2
    from @r r
),
r as (
    select d.*, cast(nodeName as varchar(1000)) Names, cast(0 as bigint) i2
    from d
    where nodeId=1
    union all
    select d.*, cast(r.names + ',' + d.nodeName as varchar(1000)), r.n2
    from d
    join r on r.routeID = d.routeID and r.nodeId=d.nodeId-1 
)
insert into @routes
select routeID, Names
from r
where n2=1
表@routes如下所示:

routeID nodeNames
1       'a,b'
2       'a,b'
3       'a,b'
4       'a,c'
5       'a,c'
6       'a,b,d'
7       'a,b,d'
A现在是最终输出:

-- the unique routes 
select MIN(r.routeID) routeID, nodeNames
from @routes r
group by nodeNames

-- the unique journyes
select MIN(journeyID) journeyID, r.nodeNames
from @j j
inner join @routes r on j.routeID = r.routeID
group by nodeNames
输出:

routeID nodeNames
1       'a,b'
4       'a,c'
6       'a,b,d'


请展示你的预期结果我很难理解你想要什么。是否希望routeID显示一次,然后在下一列中显示一个以逗号分隔的nodeID列表?是否对每个routeID的节点数有限制?首先,我希望源表只包含唯一的路由-在本例中,只保留路由1、4和6。接下来,我希望将此表与另一个表连接,其中包含节点ID和节点名称,因此我可以传递有关routeID的数据。请显示您的预期结果。我很难理解您想要什么。是否希望routeID显示一次,然后在下一列中显示一个以逗号分隔的nodeID列表?是否对每个routeID的节点数有限制?首先,我希望源表只包含唯一的路由-在本例中,只保留路由1、4和6。接下来,我希望将此表与另一个表连接,其中包含节点ID和节点名,因此我可以传递有关routeID的数据。