Python 从向量中高效提取同类节点的算法_Python_Algorithm

Python 从向量中高效提取同类节点的算法

python algorithm

Python 从向量中高效提取同类节点的算法,python,algorithm,Python,Algorithm,我有一个向量，它是分类的结果也就是说，我和s[i]在同一个班级例如： S=(2,1,1,3,6,7,5,9,6,13,12,14,12,11) (( `s[1]=2 `so node 1 and 2 are in same class and `s[2]=1` same information s[3]=1` so all 1,2,3 are in same class)) 现在我必须找到一种方法，从s中获取成员向量：查看哪些节点在同一个类中 mVC=[1,1,1,1,2,

我有一个向量，它是分类的结果也就是说，我和s[i]在同一个班级

例如：

S=(2,1,1,3,6,7,5,9,6,13,12,14,12,11)

((   `s[1]=2 `so node 1 and 2 are in same class
and    `s[2]=1`  same information
s[3]=1` so all 1,2,3 are in same class))

现在我必须找到一种方法，从s中获取成员向量：查看哪些节点在同一个类中

mVC=[1,1,1,1,2,2,2,2,2,3,3,3,3,3]

(here 1,2,3,4 are in one class)

这是一个问题。这里有一种方法：

S=(2,1,1,3,6,7,5,9,6,13,12,14,12,11)

构建一个元组列表，每个元组表示图中的边，并包含S中给定值的索引和值本身：

edges = [(ix+1, i) for ix, i in enumerate(S)]
# [(1, 2), (2, 1), (3, 1), (4, 3), (5, 6), (6, 7), (7, 5), (8,....

使用networkx构建网络并提取其数据。这将把同一类中的节点分组在一起：

import networkx as nx
G=nx.Graph()
G.add_edges_from(edges)
list(nx.connected_components(G))

输出

向量S看起来像是表示父关系，尽管它可能包含循环。所以，如果你把这个向量看作是一个有向图的邻接表，并在这个数据结构上运行深度优先搜索DFS，你会发现这个图表的连接组件，每个都将属于同一个类，由你的术语。您还可以在运行DFS时填充mVC，并以所需的格式获取数据

但是，与默认DFS相反，您需要留意后边缘或交叉边缘，并在遇到其中一种类型的边缘时更新当前正在探索的节点的分类

下面是一个示例实现。当遇到后边或交叉边时，算法停止递归，并将组件（即该边目标的分类信息）冒泡到当前正在探索的顶点

def dfs(S, u, mVC, currentComponent):
    mVC[u] = currentComponent
    if mVC[ S[u] ] == 0:
        mVC[u] = dfs(S, S[u], mVC, currentComponent)
    else:
        mVC[u] = mVC[S[u]]
    return mVC[u]

S = [0] + list(S) # to handle the 1-indexing of the content in S
mVC = [0] * len(S)
currentComponent = 1
for i in range(1, len(S)):
    if mVC[ i ] == 0:
        componentAssigned = dfs(S, i, mVC, currentComponent)
        if componentAssigned == currentComponent:
            currentComponent += 1
mVC = mVC[1:] # Gets rid of the dummy 0th element added above
# at this point, mVC contains the class relationship in the desired format

是的，它是一个图，但它是有方向的，所以如果一个节点连接到另一个节点，它们就在同一个类中。DFS是否仍然有效？尽管如此，它是否有方向并不真正改变DFS算法的任何内容。我的回答还认为图表是有方向的，应该可以正常工作。@somaye我以前忽略了DFS中的一个案例，但现在我已经修复了它。您可能会观察代码的执行情况。我尝试运行您的代码，但结果不正确，现在我运行新的代码，它非常完美。非常感谢您在这种情况下，您可能需要重新考虑您对问题正确答案的偏好。谢谢！但如果我能在不构建新图表的情况下完成它，那就更好了。五月数据集很大，我应该重复这部分很多次！这是我能想到的解决这个问题的简单方法。为什么要多次？您是否有多个必须重复此过程的列表？这应该是相当快的，甚至可以多次重新创建一个图。这个向量是分类的结果，算法会多次生成它，并对它们进行比较，以获得最佳精度。我还应该把最后一个向量改成mVC，但如果速度足够快，那就没问题了。我想我应该先完成整个代码

def dfs(S, u, mVC, currentComponent):
    mVC[u] = currentComponent
    if mVC[ S[u] ] == 0:
        mVC[u] = dfs(S, S[u], mVC, currentComponent)
    else:
        mVC[u] = mVC[S[u]]
    return mVC[u]

S = [0] + list(S) # to handle the 1-indexing of the content in S
mVC = [0] * len(S)
currentComponent = 1
for i in range(1, len(S)):
    if mVC[ i ] == 0:
        componentAssigned = dfs(S, i, mVC, currentComponent)
        if componentAssigned == currentComponent:
            currentComponent += 1
mVC = mVC[1:] # Gets rid of the dummy 0th element added above
# at this point, mVC contains the class relationship in the desired format