Python:如何通过在一列和另一列中匹配元素来创建列表
具有以下内容的数据帧:Python:如何通过在一列和另一列中匹配元素来创建列表,python,pandas,list,dataframe,linked-list,Python,Pandas,List,Dataframe,Linked List,具有以下内容的数据帧: Locations Locations 1 2 1 3 2 7 2 8 7 11 位置是成对的,例如,位置1中的鸟将飞到2,但它们也可以飞到3。然后在位置2,他们将飞往位置7,然后是位置11 我想创建一个列表,在这个列表中,我可以有效地将这些
Locations Locations
1 2
1 3
2 7
2 8
7 11
位置是成对的,例如,位置1中的鸟将飞到2,但它们也可以飞到3。然后在位置2,他们将飞往位置7,然后是位置11
我想创建一个列表,在这个列表中,我可以有效地将这些对链接在一起,没有重复的元素
预期样本输出:
[1,2,7,11]
[1,3]
[2,8]
这可能比您要求的要多,但这个问题很适合使用Networkx的图形。您可以在数据帧定义的有向图中搜索每个节点(位置)之间的所有简单路径:
import networkx as nx
from itertools import combination
# Create graph from dataframe of pairs (edges)
G = nx.DiGraph()
G.add_edges_from(df.values)
# Find paths
paths = []
for pair in combinations(G.nodes(), 2):
paths.extend(nx.all_simple_paths(G, source=pair[0], target=pair[1]))
paths.extend(nx.all_simple_paths(G, source=pair[1], target=pair[0]))
路径
:
[[1, 2],
[1, 3],
[1, 2, 7],
[1, 2, 8],
[1, 2, 7, 11],
[2, 7],
[2, 8],
[2, 7, 11],
[7, 11]]
您可能需要从
networkx
import networkx as nx
G=nx.from_pandas_edgelist(df,source='Locations',
target='Locations.1',edge_attr=True,
create_using=nx.DiGraph())
roots = list(v for v, d in G.in_degree() if d == 0)
leaves = list(v for v, d in G.out_degree() if d == 0)
[nx.shortest_path(G, x, y) for y in leaves for x in roots]
Out[58]: [[1, 3], [1, 2, 8], [1, 2, 7, 11]]
创建列表字典以表示图形
g = {}
for _, l0, l1 in df.itertuples():
g.setdefault(l0, []).append(l1)
print(g)
{1: [2, 3], 2: [7, 8], 7: [11]}
def paths(graph, nodes, path=None):
if path is None:
path = []
for node in nodes:
new_path = path + [node]
if node not in graph:
yield new_path
else:
yield from paths(graph, graph[node], new_path)
roots = g.keys() - set().union(*g.values())
p = [*paths(g, roots)]
print(*p, sep='\n')
[1, 2, 7, 11]
[1, 2, 8]
[1, 3]
然后定义一个递归函数来遍历图
g = {}
for _, l0, l1 in df.itertuples():
g.setdefault(l0, []).append(l1)
print(g)
{1: [2, 3], 2: [7, 8], 7: [11]}
def paths(graph, nodes, path=None):
if path is None:
path = []
for node in nodes:
new_path = path + [node]
if node not in graph:
yield new_path
else:
yield from paths(graph, graph[node], new_path)
roots = g.keys() - set().union(*g.values())
p = [*paths(g, roots)]
print(*p, sep='\n')
[1, 2, 7, 11]
[1, 2, 8]
[1, 3]
所以我找到了这个方法来解决你的问题,而不涉及任何图表。 但是,如果以后要使用数据帧,则必须使用数据帧的副本。 您的数据必须按照示例中的顺序进行排序
import numpy as np
import pandas as pd
df = pd.DataFrame(columns=["loc1","Loc2"],data=[[1,2],[1,3],[2,7],[2,8],[7,11]])
res = []
n = -1
m = -1
x = 0
for i in df.values:
if(x in df.index): ### test wether i has already been deleted
res.append(i.tolist()) ### saving the value
m = m +1 ### m is for later use as index of res
tmp = i[1]
for j in df.values:
n = n +1 ### n is the index of the df rows
if(j[0] == tmp):
res[m].append(j[1])
df = df.drop(df.index[n]) ### deleting the row from which the value was taken
tmp = res[m][len(res[m])-1]
n = n -1
n = -1
x = x+1
print(res)
[[1, 2, 7, 11], [1, 3], [2, 8]]
我知道这不是最好看的,但它很管用。看看
networkx
答案不错@WeNYoBen