Python：从共现矩阵创建无向加权图_Python_Matrix_Graph_Networkx_Defaultdict

Python：从共现矩阵创建无向加权图

python matrix graph

Python：从共现矩阵创建无向加权图,python,matrix,graph,networkx,defaultdict,Python,Matrix,Graph,Networkx,Defaultdict,我正在使用Python2.7创建一个使用Twitter数据并对其进行分析的项目。主要的概念是收集tweet并获取tweet集合中使用的最常见的hashtag，然后我需要创建一个图，其中hashtag将是节点。如果这些hashtag恰好出现在同一条tweet中，则该tweet将是图中的一条边，该边的权重将是共现数。因此，我尝试使用defaultdict（lambda:defaultdict（int））创建一个字典字典字典，并使用networkx创建一个图形我创建共现矩阵的代码是 def cooc

我正在使用Python2.7创建一个使用Twitter数据并对其进行分析的项目。主要的概念是收集tweet并获取tweet集合中使用的最常见的hashtag，然后我需要创建一个图，其中hashtag将是节点。如果这些hashtag恰好出现在同一条tweet中，则该tweet将是图中的一条边，该边的权重将是共现数。因此，我尝试使用

defaultdict（lambda:defaultdict（int））

创建一个字典字典字典，并使用

networkx创建一个图形
我创建共现矩阵的代码是
def coocurrence (common_entities):


com = defaultdict(lambda : defaultdict(int))

# Build co-occurrence matrix
for i in range(len(common_entities)-1):            
    for j in range(i+1, len(common_entities)):
        w1, w2 = sorted([common_entities[i], common_entities[j]])                
        if w1 != w2:
            com[w1][w2] += 1


return com

但是为了使用networkx.from_dict_of_dicts

我需要它的格式：

com={0:{1:{'weight'：1}}}

你知道我怎么解决这个问题吗？或者用另一种方式创建这样的图形？

首先，我会先对实体进行排序，这样就不会在循环中不断运行sort。然后我将使用itertools.combines来获得这些组合。对于这些更改，您需要的简单翻译如下：

from itertools import combinations
from collections import defaultdict


def coocurrence (common_entities):

    com = defaultdict(lambda : defaultdict(lambda: {'weight':0}))

    # Build co-occurrence matrix
    for w1, w2 in combinations(sorted(common_entities), 2):
        if w1 != w2:
            com[w1][w2]['weight'] += 1

    return com

print coocurrence('abcaqwvv')

首先构建其他内容，然后在第二个循环中生成最终答案可能更有效（索引更少，创建的对象更少）。第二个循环的运行周期不会像第一个循环那样多，因为所有的计数都已经计算过了。此外，由于第二个循环没有运行那么多周期，因此将

if语句

延迟到第二个循环可能会节省更多时间。通常，如果您愿意，可以在多个变体上运行timeit，但这里有一个双循环解决方案的可能示例：

def coocurrence (common_entities):

    com = defaultdict(int)

    # Build co-occurrence matrix
    for w1, w2 in combinations(sorted(common_entities), 2):
        com[w1, w2] += 1

    result = defaultdict(dict)
    for (w1, w2), count in com.items():
        if w1 != w2:
            result[w1][w2] = {'weight': count}
    return result

print coocurrence('abcaqwvv')

这是工作代码，也是最好的

在进一步研究我的项目时，我创建了一个函数，该函数从我使用代码

def create\u graph（cooccurrence\u matrix）得到的矩阵中创建一个图：g=nx.graph（）表示e，coccurrence\u matrix.iteritems（）中的co:if co>=3:g.add\u edge（e[0]，e[1]，weight=co）返回g

，但当我运行它时，它会显示

TypeError:“collections.defaultdict”对象是不可调用的，我不知道为什么。你有什么想法吗？@banana——在我看来，代码中没有什么问题，但我不是nx专家。通常情况下，整个诊断回溯都很有用，但在注释中不太合适，因此如果你能创建一个诊断回溯，那么可能值得发布一个新的问题。你能解释一下为什么这段代码是最好的吗？我使用并运行了它，它按照要求和代码中提到的那样准确工作。
def coocurrence(*inputs):
com = defaultdict(int)

for named_entities in inputs:
    # Build co-occurrence matrix
    for w1, w2 in combinations(sorted(named_entities), 2):
        com[w1, w2] += 1
        com[w2, w1] += 1  #Including both directions

result = defaultdict(dict)
for (w1, w2), count in com.items():
    if w1 != w2:
        result[w1][w2] = {'weight': count}
return result