Python 3.x 使用pandas从特定结构轻松生成边列表_Python 3.x_Pandas_Graph_Code Readability

Python 3.x 使用pandas从特定结构轻松生成边列表

python-3.x pandas graph

Python 3.x 使用pandas从特定结构轻松生成边列表,python-3.x,pandas,graph,code-readability,Python 3.x,Pandas,Graph,Code Readability,这是一个关于如何正确使用pandas的问题（我使用的是版本1.0）。假设我有一个任务数据框，其中包含一个起点和一个或多个目的地： mid from to 0 0 A [C] 1 1 A [B, C] 2 2 B [B] 3 3 C [D, E, F] 例如：对于任务（mid=1），人们将从A旅行到B，然后从B旅行到C，最后从C旅行到A。请注意，我无法控制输入数据帧的数据模型我想计算任

这是一个关于如何正确使用pandas的问题（我使用的是版本

1.0

）。假设我有一个任务数据框，其中包含一个起点和一个或多个目的地：

   mid from         to
0    0    A        [C]
1    1    A     [B, C]
2    2    B        [B]
3    3    C  [D, E, F]

例如：对于任务（

mid=1

），人们将从

旅行到

，然后从

旅行到

，最后从

旅行到

。请注意，我无法控制输入数据帧的数据模型

我想计算任务每次旅行的指标。预期的产出正好是：

    tid  mid from to
0     0    0    A  C
1     1    0    C  A
2     2    1    A  B
3     3    1    B  C
4     4    1    C  A
5     5    2    B  B
6     6    2    B  B
7     7    3    C  D
8     8    3    D  E
9     9    3    E  F
10   10    3    F  C

我找到了实现目标的方法。请在MCVE下面找到：

import pandas as pd

# Input:
df = pd.DataFrame(
    [["A", ["C"]],
     ["A", ["B", "C"]],
     ["B", ["B"]],
     ["C", ["D", "E", "F"]]],
    columns = ["from", "to"]
).reset_index().rename(columns={'index': 'mid'})

# Create chain:
df['chain'] = df.apply(lambda x: list(x['from']) + x['to'] + list(x['from']), axis=1)
# Explode chain:
df = df.explode('chain')
# Shift to create travel:
df['end'] = df.groupby("mid")["chain"].shift(-1)
# Remove extra row, clean, reindex and rename:
df = df.dropna(subset=['end']).reset_index(drop=True).reset_index().rename(columns={'index': 'tid'})
df = df.drop(['from', 'to'], axis=1).rename(columns={'chain': 'from', 'end': 'to'})

我的问题是：有没有更好/更简单的方法来制作熊猫？说得更好，我的意思是，不需要更高的性能（可能会偏离路线），而是更具可读性和直观性

您的操作基本上是

explode

和

concat

：

# turn series of lists in to single series
tmp = df[['mid','to']].explode('to')

# new `from` is concatenation of `from` and the list
df1 = pd.concat((df[['mid','from']],
                 tmp.rename(columns={'to':'from'})
          )
         ).sort_index()

# new `to` is concatenation of list and `to``
df2 = pd.concat((tmp,
                 df[['mid','from']].rename(columns={'from':'to'})
                )
         ).sort_index()

df1['to'] = df2['to']

输出：

   mid from to
0    0    A  C
0    0    C  A
1    1    A  B
1    1    B  C
1    1    C  A
2    2    B  B
2    2    B  B
3    3    C  D
3    3    D  E
3    3    E  F
3    3    F  C

如果您不介意重新构建整个数据帧，那么您可以使用

np对其进行清理。滚动以获得目的地对，然后根据行程数（在l
中每个子列表的长度）分配mid
的值
import pandas as pd
import numpy as np
from itertools import chain

l = [[fr]+to for fr,to in zip(df['from'], df['to'])]

df1 = (pd.DataFrame(data=chain.from_iterable([zip(sl, np.roll(sl, -1)) for sl in l]),
                    columns=['from', 'to'])
         .assign(mid=np.repeat(df['mid'].to_numpy(), [*map(len, l)])))

   from to  mid
0     A  C    0
1     C  A    0
2     A  B    1
3     B  C    1
4     C  A    1
5     B  B    2
6     B  B    2
7     C  D    3
8     D  E    3
9     E  F    3
10    F  C    3