Python 通过匹配行值从Pandas数据帧生成边列表并放入edgelist_Python_Pandas_Networkx

Python 通过匹配行值从Pandas数据帧生成边列表并放入edgelist

python pandas

Python 通过匹配行值从Pandas数据帧生成边列表并放入edgelist,python,pandas,networkx,Python,Pandas,Networkx,我是Python新手，我正在尝试使用存储在pandas dataframe中的数据集运行社区检测算法，为此，我需要从该数据集中创建一个edgelist，并将其放入图形中。我需要这个edgelist由具有匹配列值的行组成。数据集由19列和2000多行组成，我需要在每个列行之间创建具有匹配值的边。例如，如果数据集是 id col1 col2 col3 1 12 10 20 2 14 10 19 3 12 10 9 然后会有以下几条边

我是Python新手，我正在尝试使用存储在pandas dataframe中的数据集运行社区检测算法，为此，我需要从该数据集中创建一个edgelist，并将其放入图形中。我需要这个edgelist由具有匹配列值的行组成。数据集由19列和2000多行组成，我需要在每个列行之间创建具有匹配值的边。例如，如果数据集是

id   col1 col2 col3 
 1    12    10   20
 2    14    10   19
 3    12    10   9

然后会有以下几条边

row1 col1, row2 col1
row1 col2, row2 col2
row1 col2, row3 col2
row2 col2, row3 col2

我已经尝试了几种方法，但似乎没有一种能完全按照我想要的方式工作。我得到的最接近的方法是使用以下代码：

#define edges as column rows that have matching data
edges = set()
for col in dataset:
    for _, data in dataset.groupby(col):
        edges.update(itertools.combinations(data.index, 2)) 

#create empty graph
G = nx.Graph()
#add index number as node to graph
G.add_nodes_from(dataset.index)
#add edges created 
G.add_edges_from(edges)

#uses community library to work define best partition that maximise modularity (Louvain Algorithm)
partition= community.best_partition(G)

#create graph from the results of the partition
size = float(len(set(partition.values())))
pos = nx.spring_layout(G)
count = 0.
for com in set(partition.values()) :
    count = count + 1.
    list_nodes = [nodes for nodes in partition.keys()
                                if partition[nodes] == com]
    nx.draw_networkx_nodes(G, pos, list_nodes, node_size = 20, cmap=plt.cm.RdYlBu,
                                node_color=list(partition.values()))
plt.show()

什么不适合你？这似乎是一种获取边缘的合适方法。事实上，当图形显示时，节点的颜色都是相同的，但事实并非如此。每个节点社区应该是不同的颜色，我想我一定是在创建图形时做错了什么。边缘是我能想到的唯一不正确的部分什么不适合你？这似乎是一种获取边缘的合适方法。事实上，当图形显示时，节点的颜色都是相同的，但事实并非如此。每个节点社区应该是不同的颜色，我想我一定是在创建图形时做错了什么。边缘是我能想到的唯一不正确的部分