Python 考虑传递等价性的无向图连通分量的高效查找_Python_Graph_Computer Science_Networkx_Graph Theory

Python 考虑传递等价性的无向图连通分量的高效查找

python graph computer-science

Python 考虑传递等价性的无向图连通分量的高效查找,python,graph,computer-science,networkx,graph-theory,Python,Graph,Computer Science,Networkx,Graph Theory,我有一组节点和一个函数foo（u，v），可以确定两个节点是否相等。“相等”是指传递等价：如果1==2和2==3那么1==3以及：如果1==2和1=4然后2=4 当给定一组节点时，我可以通过将节点的每个可能组合传递给foo（u，v）（返回预定结果仅用于表示-这不是真正的函数！）函数并构建所需的边，在图中找到所有连接的组件。像这样： import networkx as nx import itertools from matplotlib import pyplot as plt def f

我有一组节点和一个函数

foo（u，v）

，可以确定两个节点是否相等。“相等”是指传递等价：

如果1==2

和

2==3

那么

1==3

以及：

如果1==2

和

1=4

然后

2=4

当给定一组节点时，我可以通过将节点的每个可能组合传递给

foo（u，v）

（返回预定结果仅用于表示-这不是真正的函数！）函数并构建所需的边，在图中找到所有连接的组件。像这样：

import networkx as nx import itertools from matplotlib import pyplot as plt def foo(u, v): # this function is simplified, in reality it will do a complex # calculation to determine whether nodes are equal. EQUAL_EDGES = {(1, 2), (2, 3), (1, 3), (4, 5)} return (u, v) in EQUAL_EDGES def main(): g = nx.Graph() g.add_nodes_from(range(1, 5 + 1)) for u, v in itertools.combinations(g.nodes, 2): are_equal = foo(u, v) print '{u}{sign}{v}'.format(u=u, v=v, sign='==' if are_equal else '!=') if are_equal: g.add_edge(u, v) conn_comps = nx.connected_components(g) nx.draw(g, with_labels=True) plt.show() return conn_comps if __name__ == '__main__': main()
这种方法的问题是，我得到了许多我希望避免的冗余检查：

1==2 # ok 1==3 # ok 1!=4 # ok 1!=5 # ok 2==3 # redundant check, if 1==2 and 1==3 then 2==3 2!=4 # redundant check, if 1!=4 and 1==2 then 2!=4 2!=5 # redundant check, if 1!=5 and 1==2 then 2!=5 3!=4 # redundant check, if 1!=4 and 1==3 then 3!=4 3!=5 # redundant check, if 1!=5 and 1==3 then 3!=5 4==5 # ok
我想避免在O（n^2）时间复杂度下运行。
通过自定义
foo（u，v）
函数高效查找所有连接组件的正确方法是什么（或者任何python库中的现有函数？
不清楚您真正想做什么，但这里有一个解决方案，它只检查每个等效组中的一个元素：

nodes2place = range(1, 6) cclist = [] for u in nodes2place: node_was_placed=False for icc in range(len(cclist)): if foo(u, cclist[icc][0]): cclist[icc].append(u) node_was_placed=True break # node doesn't fit into existing cc so make a new one if not node_was_placed: cclist.append([u])

您可以在两个相应的字典中跟踪哪些边在传递上相等或不相等。对于每个边组合，您可以在O（1）时间内进行一些简单检查，以查看计算是否冗余。否则，根据第一原理进行计算，然后根据边是否相等，使用必要的信息更新上述词典。您仍然需要进行C（n，2）相等性检查，因为这是您迭代的组合数，但是对于一组组合，可能会立即做出决定

equal_edges
字典更容易解释，所以让我们从它开始。1-2边对是相等的，但由于1或2都不作为键存在（dict现在是空的），我们创建集合
{1，2}
，并将其附加到
相等边[1]
和
相等边[2]
。然后我们遇到了等边对1-3。由于
equal_边[1]
现在存在，我们在其传递相等的节点上添加了3。但由于此集合在边1和边2之间共享，因此在这两个位置都会更新。我们现在还必须将同一组附加到
等边[3]
。所有三条边都指向内存中的相同集合，即，
{1,2,3}
，因此我们不复制任何数据。现在，当检查等边对2-3时，等边中的
3[2]
或等边中的
2[3]
允许我们绕过任何繁重的计算
对于
不等边
来说，逻辑有些相似，但我们还必须参考
不等边
字典以了解传递性不等边。例如，边对1-4不相等。但由于1在传递上同时等于2和3，我们必须有
不等边[4]=相等边[1]
。设置
不等边[1]={4}
或
不等边[2]={4}
等是多余的。这是因为可以从
不等边[4]
获取此信息。这只是意味着对于传递不相等对a-b，我们需要进行双重检查，即不相等边[b]中的a或不相等边[a]中的b

from itertools import combinations equal_edges = {} unequal_edges = {} def update_equal_edges(a, b): def update_one(a, b): equal_edges[a].add(b) equal_edges[b] = equal_edges[a] exists_a = a in equal_edges exists_b = b in equal_edges if not (exists_a or exists_b): s = set((a, b)) equal_edges[a] = s equal_edges[b] = s elif exists_a and not exists_b: update_one(a, b) elif exists_b and not exists_a: update_one(b, a) def update_unequal_edges(a, b): exists_a = a in equal_edges exists_b = b in equal_edges if not (exists_a or exists_b): s = set((a, b)) unequal_edges[a] = s unequal_edges[b] = s elif exists_a and not exists_b: unequal_edges[b] = equal_edges[a] elif exists_b and not exists_a: unequal_edges[a] = equal_edges[b] def are_equal_edges(a, b): if a in equal_edges.get(b, []): print('{}=={} # redundant'.format(a, b)) return True if (a in unequal_edges.get(b, [])) or (b in unequal_edges.get(a, [])): print('{}!={} # redundant'.format(a, b)) return False # hardcoded equal edges which are the result # of some complex computations are_equal = (a, b) in {(1, 2), (1, 3), (4, 5)} if are_equal: update_equal_edges(a, b) else: update_unequal_edges(a, b) print('{}{}{} # ok'.format(a, '==' if are_equal else '!=', b)) return are_equal
打印语句用于演示目的。如果你跑

for a, b in combinations(range(1, 6), 2): are_equal_edges(a, b)
您将得到以下结果

1==2 # ok 1==3 # ok 1!=4 # ok 1!=5 # ok 2==3 # redundant 2!=4 # redundant 2!=5 # redundant 3!=4 # redundant 3!=5 # redundant 4==5 # ok

您可以使用
0
表示相等，并使用
math.inf
将不相等表示为边权重。然后，对于每个节点对
u，v
，您可以计算从
u
到
v
的路径长度，并根据结果决定是否需要调用（重）节点检查：

g = nx.Graph() g.add_nodes_from(range(1, 6)) for u, v in it.combinations(g.nodes, 2): try: path = nx.shortest_path(g, u, v) except nx.NetworkXNoPath: new_weight = 0 if func(u, v) else math.inf else: weights = list(x['weight'] for x in it.starmap(g.get_edge_data, zip(path[:-1], path[1:]))) if min(weights) == math.inf: new_weight = 0 if func(u, v) else math.inf elif max(weights) == math.inf: new_weight = math.inf else: new_weight = 0 g.add_edge(u, v, weight=new_weight)
如果您不喜欢图形中的这些无限边，则可以：

一旦构建了图形，就删除它们

或者保持最后一个图与无穷大图平行，最后只保留最后一个

当节点之间没有连接时，此解决方案在最坏情况下仍然是O（n^2）。是的，在最坏情况下仍然是O（n^2），但我不清楚您是否可以避免，因为您如何处理等价关系搜索问题。顺便说一句，你是否真的有一个底层的图形结构，或者你是否假设你需要一个来找到你的组件？正如你所问的那样，看起来你是在从关系中构建一个图表，而不是相反。你无法避免复杂性
O（n^2）
。您所能做的就是避免冗余比较如果函数
foo
通过尊重所有节点之间的瞬时相等来计算两个节点的相等，那么为什么不让该函数首先处理冗余呢？或者反过来说，该函数如何确保它尊重所有节点之间的暂时平等？