SQL，选择至少在N列上匹配的行对_Sql_Postgresql_Query Optimization

SQL，选择至少在N列上匹配的行对

sql postgresql

SQL，选择至少在N列上匹配的行对,sql,postgresql,query-optimization,Sql,Postgresql,Query Optimization,我发现自己在寻找SQL的一个不寻常的问题假设我有一个包含N列的表T1，为了简单起见，命名为a…Z 我需要的是找到至少在N个属性上匹配的所有行对让我们看一个非常简单的例子： ID | A | B | C | D | ---+---+---+---+---+- -1-| 1 | 2 | 3 | 4 | ---+---+---+---+---+- -2-| 2 | 3 | 4 | 1 | ---+---+---+---+---+- -3-| 1 | 2 | 2 | 1 | 在这种

我发现自己在寻找SQL的一个不寻常的问题

假设我有一个包含N列的表T1，为了简单起见，命名为a…Z

我需要的是找到至少在N个属性上匹配的所有行对

让我们看一个非常简单的例子：

ID | A | B | C | D |
---+---+---+---+---+-
-1-| 1 | 2 | 3 | 4 |
---+---+---+---+---+-    
-2-| 2 | 3 | 4 | 1 |
---+---+---+---+---+-    
-3-| 1 | 2 | 2 | 1 |

在这种情况下，N=2

第1行应该与第3行匹配，它们在A列和B列上匹配第2行没有匹配项第3行与第1行匹配，因为关系是对称的。

你知道怎么做吗？

我建议这样做。当然，可以根据要求在案例列表中添加字段数量：

select id1, id2, matches from
(
select
     tab1.id id1, tab2.id id2,
     (case when tab1.a=tab2.a then 1 else 0 end)+
     (case when tab2.b=tab2.b then 1 else 0 end) matches
from
           t1 tab1
cross join t1 tab2
)
where matches>1;

您将获得匹配行的ID和匹配字段的数量。

使用表上的自连接，您可以找到所有行对的匹配值数量：

with my_table(id, a, b, c, d) as (
values
    (1, 1, 2, 3, 4),
    (2, 2, 3, 4, 1),
    (3, 1, 2, 2, 1)
)

select 
    t1.id, t2.id, 
    (t1.a = t2.a)::int+ (t1.b = t2.b)::int+ (t1.c = t2.c)::int+ (t1.d = t2.d)::int as matches
from my_table t1
join my_table t2 on t1.id < t2.id

 id | id | matches 
----+----+---------
  1 |  2 |       0
  1 |  3 |       2
  2 |  3 |       1
(3 rows)

并使用转换表的函数：

with my_table(id, a, b, c, d) as (
values
    (1, 1, 2, 3, 4),
    (2, 2, 3, 4, 1),
    (3, 1, 2, 2, 1)
),
my_table_transformed (id, cols) as (
    select id, array_agg(value::int)
    from my_table,
    to_jsonb(my_table) j,
    jsonb_each_text(j)
    where key <> 'id'
    group by 1
)
select t1.id, t2.id, find_matches(t1.cols, t2.cols)
from my_table_transformed t1
join my_table_transformed t2 on t1.id < t2.id;

 id | id | find_matches 
----+----+--------------
  1 |  2 |            0
  1 |  3 |            2
  2 |  3 |            1
(3 rows)

最后一个查询将用于具有不同列数的表。

对应该是相邻的，或者可以是任意一对？像A，C？当N是一个动态值时，恐怕没有答案。但是，如果N给定为固定值，则可以先对列进行压缩，然后很容易检索所需的数据。

create or replace function find_matches(a1 int[], a2 int[])
returns int language sql as $$
    select sum(m)::int
    from (
        select (c1 = c2)::int as m
        from unnest(a1, a2) u(c1, c2)
    ) s
$$;

with my_table(id, a, b, c, d) as (
values
    (1, 1, 2, 3, 4),
    (2, 2, 3, 4, 1),
    (3, 1, 2, 2, 1)
),
my_table_transformed (id, cols) as (
    select id, array_agg(value::int)
    from my_table,
    to_jsonb(my_table) j,
    jsonb_each_text(j)
    where key <> 'id'
    group by 1
)
select t1.id, t2.id, find_matches(t1.cols, t2.cols)
from my_table_transformed t1
join my_table_transformed t2 on t1.id < t2.id;

 id | id | find_matches 
----+----+--------------
  1 |  2 |            0
  1 |  3 |            2
  2 |  3 |            1
(3 rows)