Sql 在数据库表中查找连接子集_Sql_Duplicate Removal

Sql 在数据库表中查找连接子集

sql

Sql 在数据库表中查找连接子集,sql,duplicate-removal,Sql,Duplicate Removal,在我的表中，我有一些与另一个匹配的记录： 644432 738987 738987 644432 .. 854313 871860 854313 874411 871860 854313 871860 874411 874411 854313 874411 871860 例如，644432与738987匹配，738987显然与644432匹配。对我来说，他们必须是相同的，我必须得到一个，而且只有一个644432或738987什么的另一个例子854313与871860匹配，与874411匹配

在我的表中，我有一些与另一个匹配的记录：

644432 738987
738987 644432
..
854313 871860
854313 874411
871860 854313
871860 874411
874411 854313
874411 871860

例如，644432与738987匹配，738987显然与644432匹配。对我来说，他们必须是相同的，我必须得到一个，而且只有一个644432或738987什么的

另一个例子854313与871860匹配，与874411匹配，这就是为什么我有6条记录

我必须在决赛中只拿到两张记录，我怎么能做到

对不起我的英语，如果我的问题不清楚，谢谢你告诉我

对于该示例，有一个填充表的代码，例如：

DECLARE @DataTable TABLE (ColA  INT, ColB  INT)
insert into @DataTable  values 
(644432,    738987),
(738987,    644432),
(854313,    871860),
(854313,    874411),
(871860,    854313),
(871860,    874411),
(874411,    854313),
(874411,    871860)
select * from @DataTable

假设这是一个名为DataTable的表，有两列，ColA和ColB，那么您可以执行以下操作：

select distinct Smallest,Largest from
(
  select case when ColA > ColB then ColB else ColA end as Smallest,
  case when ColA > ColB then ColA else colB end as Largest
  from DataTable
) minmax

这将使用内部选择来重新排列值，以便最小值始终位于第一列，最大值位于第二列。然后外部选择只会拉出一组不同的值。

试试这个

从表1中选择col1、col2，从表1中选择col1+col2作为指示符按指标分组

注意：如果两个不同行的总和相同，这将不起作用。

用于查找连接集的递归查询。对于每条链，数量最少的项目报告为组长

select n1,n2 from(select a.col1 col1,a.col2 col2,rownum rn from tbl a, tbl b 
where a.col1||a.col2=(b.col2||b.col1)) where mod(rn,2)<>0
union
select a.col1 col1,a.col2 col2 from tbl a left outer join tbl b on 
a.col1||a.col2=(b.col2||b.col1) where b.col1 is null

查询首先对成对的成员进行排序，然后查找连接组件的链。如果集群有多个起点，则此方法将不起作用。但它确实避免了循环

此语法适用于Postgresql，对于microsoft，应省略RECURSIVE关键字，对于Oracle，应使用CONNECT BY，Previor。YMMV

结果:

CREATE TABLE
INSERT 0 8
  top   | n_members 
--------+-----------
 644432 |         2
 854313 |         8
(2 rows)

好的，下面的例子将遵循一级深度的链接。当然，这可以通过一个存储过程或从代码中创建查询进行大规模清理，这样可以更容易地添加额外级别的链接跟踪

-- Set up an example table
create table DataTable
(
    A int,
    B int
)
GO

insert into DataTable values(644432,738987)
insert into DataTable values(738987,644432)
insert into DataTable values(854313,871860)
insert into DataTable values(854313,874411)
insert into DataTable values(871860,854313)
insert into DataTable values(871860,874411)
insert into DataTable values(874411,854313)
insert into DataTable values(874411,871860)
GO

-- Strip out initial duplicates
select distinct A,B into Pass1
from
(
  select case when A > B then B else A end as A,
  case when A > B then A else B end as B
  from DataTable
) minmax

-- Create a copy that we will update with links between values
select * into Pass2 from Pass1 order by A

update Pass2 set B=x.NewB from
(
  select L.A as OldA,L.B as OldB, R.B as NewB
  from Pass1 L
  inner join Pass1 R on L.B = R.A
) x
where Pass2.A=x.OldA and Pass2.B=x.OldB

update Pass2 set A=x.NewA from
(
  select L.B as OldA, R.B as OldB, L.A as NewA
  from Pass1 L
  inner join Pass1 R on L.B = R.A
) x
where Pass2.A=x.OldA and Pass2.B=x.OldB

-- Dedupe any newly created duplicates
select distinct A,B
from
(
  select case when A > B then B else A end as A,
  case when A > B then A else B end as B
  from Pass2
) minmax

欢迎来到堆栈溢出！请通过添加适当的标记Oracle、SQL Server、MySQL等来指定目标RDBMS。。可能有一些答案利用了并非普遍支持的语言或产品功能。此外，通过使用特定的RDBMS标记，您的问题可能会得到更适合回答它的人的注意。如果您有选择，您想要哪一条记录？没有选择，我只想在每个组中获得一条，然后去吃午餐：提前感谢。您想隔离连接记录的群集，并为找到的每个群集报告一个结果？顺便说一句，在第二个集群中有一个循环，这将使递归解决方案更加困难。请根据已经给出的数据包含所需的结果。根据你的逻辑，我最终得到了两条以上的记录。第一组有效，但第二组无效：我得到的结果是：最小最大的644432 738987>>很好，因为738987 644432被删除了854313 871860>>1/3 854313 874411>>2/3 871860 874411>>3/3我想要1/3或2/3或3/3中的一条：对不起，我写的内容只会识别直接重复项，不会进行更复杂的第二类匹配。好的，不要抱歉，我会重新解释：记录644432和738987有一些共同点>>这就是为什么前面的代码会给出两条记录。871860、854313和874411也有一些共同点，这就是为什么我们有6条记录，第一种情况下我只想要644432或738987，结果是871860、854313或874411>>每个组有2条记录。我添加了一个新的答案，当你只需要遵循数字之间的一个链接级别时就可以了。非常感谢，这对我来说很有效，即使我不了解很多，你只需要遵循一个层次的数字之间的联系。。。但它可以工作，并且数据都与我在示例中给出的相同。列名无效：“indicator”：然而，选择一份你喜欢的工作+1，你就永远不用在家里工作一天了life@user2190624谢谢你的评论。更改分组人。仍然不起作用：我在问题字段中添加了一个代码以获得一个示例。Thanx用于答复：执行代码并查看：>>Change@@@by DECLARE@@@DataTable ColA INT，ColB INT insert into@DataTable values 644432，738987，644432，854313，871860，854313，874411，871860，854313，871860，871860，851860，874411，874411，854313，874411，871860选择*自@@DataTable选择可乐，可乐自选择可乐+可乐作为指示符，可乐，可乐自@@DataTable a分组依据indicator@user2190624已成功执行。您正在使用哪个数据库？我为MysqlNOt工作不幸的是，我在问题字段中添加了一个代码以获得一个示例。我为您指定的数据尝试了相同的查询。。。它对我来说运行良好。顺便问一下，您使用的是哪种RDBMS？上面的查询在OracleSql Server 2008中可以正常工作，结果得到两行？这是我想要的结果，但我在SQL Server 2008 R2中工作，只要我能告诉我的老板现在就更改它：我正在尝试了解您使用的所有算法。正如我所说的，对于microsoft，您应该删除递归查询

伊沃德。也许还有其他一些小改动。是的，因为结果，我建议阅读所有答案：对不起，好的，我会尝试做这些改动，我希望它能起作用：谢谢，不要把它翻译到微软这是勒根-等等-达里！！！非常感谢你Stony和thanx，谢谢你！我所说的1级链接的意思是，如果你做了12、23、34、45级链接，它将无法完全工作。

-- Set up an example table
create table DataTable
(
    A int,
    B int
)
GO

insert into DataTable values(644432,738987)
insert into DataTable values(738987,644432)
insert into DataTable values(854313,871860)
insert into DataTable values(854313,874411)
insert into DataTable values(871860,854313)
insert into DataTable values(871860,874411)
insert into DataTable values(874411,854313)
insert into DataTable values(874411,871860)
GO

-- Strip out initial duplicates
select distinct A,B into Pass1
from
(
  select case when A > B then B else A end as A,
  case when A > B then A else B end as B
  from DataTable
) minmax

-- Create a copy that we will update with links between values
select * into Pass2 from Pass1 order by A

update Pass2 set B=x.NewB from
(
  select L.A as OldA,L.B as OldB, R.B as NewB
  from Pass1 L
  inner join Pass1 R on L.B = R.A
) x
where Pass2.A=x.OldA and Pass2.B=x.OldB

update Pass2 set A=x.NewA from
(
  select L.B as OldA, R.B as OldB, L.A as NewA
  from Pass1 L
  inner join Pass1 R on L.B = R.A
) x
where Pass2.A=x.OldA and Pass2.B=x.OldB

-- Dedupe any newly created duplicates
select distinct A,B
from
(
  select case when A > B then B else A end as A,
  case when A > B then A else B end as B
  from Pass2
) minmax