Python：两个网络之间的相似性？_Python_Networkx_Similarity

Python：两个网络之间的相似性？

python

Python：两个网络之间的相似性？,python,networkx,similarity,Python,Networkx,Similarity,我使用networkx包生成了2大型网络G和G1。我想计算所有节点之间的jaccard相似性指数一种可能的方法是： def returnJaccardNetworks(G, G1): tmp = list(G.nodes()) tmp1 = list(G1.nodes()) tmp2 = np.unique([tmp, tmp1]) ### Find nodes in the networks jc = [] for i in tmp2:

我使用

networkx

包生成了

大型网络

和

G1

。我想计算所有节点之间的jaccard相似性指数

一种可能的方法是：

def returnJaccardNetworks(G, G1):
    tmp =   list(G.nodes())
    tmp1 =  list(G1.nodes())
    tmp2 =  np.unique([tmp, tmp1]) ### Find nodes in the networks
    jc = []
    for i in tmp2:
    ## if the node i is in G and in G1 compute 
    ## the similarity between the lists of the ajacent nodes
    ## otherwise append 0
        if (i in G) and (i in G1):  
            k1 = list(G[i]) ## adjacent nodes of i in the network G     
            k2 = list(G1[i]) ## adjacent nodes of i in the network G1 
            ### Start Jaccard Similarity
            intersect = list(set(k1) & set(k2))
            n = len(intersect)
            jc.append(n / float(len(k1) + len(k2) - n))
            ### End Jaccard Similariy
        else:
            jc.append(0)
    return jc

我想知道是否有更有效的方法。我注意到包中有一个名为

jaccard\u coefficient

的函数，但我不确定它是如何工作的

您的实现非常高效（尽管在我看来并不十分高效）。使用此版本，我可以在计算机上缩短15%的执行时间：

def get_jaccard_coefficients(G, H):
    for v in G:
        if v in H:
            n = set(G[v]) # neighbors of v in G
            m = set(H[v]) # neighbors of v in H
            length_intersection = len(n & m)
            length_union = len(n) + len(m) - length_intersection
            yield v, float(length_intersection) / length_union
        else:
            yield v, 0. # should really yield v, None as measure is not defined for these nodes

另一个版本更紧凑，更易于维护，但执行时间增加了30%：

def get_jaccard_coefficients(G, H):
    for v in set(G.nodes) & set(H.nodes): # i.e. the intersection
        n = set(G[v]) # neighbors of v in G
        m = set(H[v]) # neighbors of v in H
        yield v, len(n & m) / float(len(n | m))

根据您提供的链接，jacard_系数（X，所有2对元组的列表）应该可以工作。其中X是G，G'together@Bayko谢谢，但我该如何编写代码？@Bayko这也是我的第一反应，但这种方法在这里不起作用，因为OP需要保留网络中节点的身份，但每个节点都有不同的连接。