PostgreSQL中的按数组分组重叠

PostgreSQL中的按数组分组重叠,sql,arrays,postgresql,graph-theory,recursive-query,Sql,Arrays,Postgresql,Graph Theory,Recursive Query,我想编写一个通过重叠数组对行进行分组的查询。考虑下面的例子: id | name | family_ids -------------------------- 1 | Alice | [f1, f2] 2 | Bob | [f1] 3 | Freddy | [f2, f3] 4 | James | [f3] 5 | Joe | [f4, f5] 6 | Tim | [f5] 爱丽丝和鲍勃是同一个家庭的一员。在爱丽丝的姻亲f2中

我想编写一个通过重叠数组对行进行分组的查询。考虑下面的例子:

 id | name    | family_ids
--------------------------
 1  | Alice   | [f1, f2]
 2  | Bob     | [f1]
 3  | Freddy  | [f2, f3]
 4  | James   | [f3]
 5  | Joe     | [f4, f5]
 6  | Tim     | [f5]
爱丽丝和鲍勃是同一个家庭的一员。在爱丽丝的姻亲f2中,她也与弗雷迪有亲戚关系。考虑到弗雷迪的f3家族,詹姆斯也和他们有亲戚关系

所以,基本上,我想按有重叠的族ID中的数组分组。但是,请注意,还应该发现f2->f3,这对于一个简单的GROUPBY查询是不可能的


我一直在玩很多关于内部联接的游戏,按t1.family\u id和t2.family\u id分组,但似乎找不到一个性能良好的解决方案。目前,该表的行数约为10万行。未来,该表将增加到约500k-1M行。

这是一个图形漫游问题

一种常见的方法是取消数组的测试以生成节点,然后在匹配的族上进行自连接以计算所有边。然后,我们可以使用递归查询遍历图,同时注意不要访问同一个节点两次,然后聚合以生成组。最后一步是恢复相应的族ID

with recursive
    nodes as (
        select t.id, x.family_id
        from mytable t
        cross join lateral unnest(t.family_ids) as x(family_id)
    ),
    edges as (
        select n1.id as id1, n2.id as id2
        from nodes n1
        inner join nodes n2 using (family_id)
    ),
    cte as (
        select id1, id2, array[id1] as visited 
        from edges
        where id1 = id2
        union all 
        select c.id1, e.id2, c.visited || e.id2
        from cte c
        inner join edges e on e.id1 = c.id2
        where e.id2 <> all(c.visited)
    ),
    res as (
        select id1, array_agg(distinct id2 order by id2) as id2s
        from cte
        group by id1
    )
select 
    array_agg(distinct n.id order by n.id) as ids, 
    array_agg(distinct n.family_id order by n.family_id) as family_ids
from res r
inner join nodes n on n.id = r.id1
group by r.id2s
:


嗯,性能是个问题。无论设计哪种算法,都会导致n对n的比较,从而导致n²的复杂性。我不确定是否有任何结果会让你快乐…你是一个救命恩人!这工作出色,速度快!谢谢!
with recursive
    nodes as (
        select t.id, x.family_id
        from mytable t
        cross join lateral unnest(t.family_ids) as x(family_id)
    ),
    edges as (
        select n1.id as id1, n2.id as id2
        from nodes n1
        inner join nodes n2 using (family_id)
    ),
    cte as (
        select id1, id2, array[id1] as visited 
        from edges
        where id1 = id2
        union all 
        select c.id1, e.id2, c.visited || e.id2
        from cte c
        inner join edges e on e.id1 = c.id2
        where e.id2 <> all(c.visited)
    ),
    res as (
        select id1, array_agg(distinct id2 order by id2) as id2s
        from cte
        group by id1
    )
select 
    array_agg(distinct n.id order by n.id) as ids, 
    array_agg(distinct n.family_id order by n.family_id) as family_ids
from res r
inner join nodes n on n.id = r.id1
group by r.id2s
ids | family_ids :-------- | :--------- {1,2,3,4} | {f1,f2,f3} {5,6} | {f4,f5}