Python 根据各个距离创建距离矩阵

Python 根据各个距离创建距离矩阵,python,pandas,numpy,dictionary,matrix,Python,Pandas,Numpy,Dictionary,Matrix,我有一个列表,列出了一条铁路中每两个相邻车站之间按正确顺序增加的距离。我需要做的是为每两个站点之间的距离创建一个矩阵。这是这张单子 +-------------------------+-------------------------+---------------+ | Departure Station | Arrival Station | distance in m | +-------------------------+---

我有一个列表,列出了一条铁路中每两个相邻车站之间按正确顺序增加的距离。我需要做的是为每两个站点之间的距离创建一个矩阵。这是这张单子



    +-------------------------+-------------------------+---------------+
    |    Departure Station    |     Arrival Station     | distance in m |
    +-------------------------+-------------------------+---------------+
    |                         | San Francisco           |           0.0 |
    | San Francisco           | 22nd Street             |   2521.949349 |
    | 22nd Street             | Bayshore                |     5875.8986 |
    | Bayshore                | South San Francisco     |   6690.161279 |
    | South San Francisco     | San Bruno               |   2964.853585 |
    | San Bruno               | Millbrae Transit Center |   4154.792069 |
    | Millbrae Transit Center | Broadway                |   2549.171972 |
    | Broadway                | Burlingame              |   1762.653178 |
    | Burlingame              | San Mateo               |   2307.847611 |
    | San Mateo               | Hayward Park            |   2148.992125 |
    | Hayward Park            | Hillsdale               |   2597.932334 |
    | Hillsdale               | Belmont                 |       2092.15 |
    | Belmont                 | San Carlos              |   1990.239598 |
    | San Carlos              | Redwood City            |   3492.618122 |
    | Redwood City            | Atherton                |   3847.644532 |
    | Atherton                | Menlo Park              |    1752.92218 |
    | Menlo Park              | Palo Alto               |   2011.382315 |
    | Palo Alto               | Stanford                |   1582.663905 |
    | Stanford                | California Ave.         |       965.606 |
    | California Ave.         | San Antonio             |   3939.685111 |
    | San Antonio             | Mountain View           |   3108.414275 |
    | Mountain View           | Sunnyvale               |    4312.51742 |
    | Sunnyvale               | Lawrence                |   3189.943773 |
    | Lawrence                | Santa Clara             |   5889.680131 |
    | Santa Clara             | College Park            |    2252.43061 |
    | College Park            | San Jose Diridon        |   1872.857195 |
    | San Jose Diridon        | Tamien                  |   2887.967478 |
    | Tamien                  | Capitol                 |    4999.21158 |
    | Capitol                 | Blossom Hill            |   5304.202424 |
    | Blossom Hill            | Morgan Hill             |   19050.76536 |
    | Morgan Hill             | San Martin              |     5917.5495 |
    | San Martin              | Gilroy                  |   10061.59472 |
    | Gilroy                  | Gilroy                  |           0.0 |
    +-------------------------+-------------------------+---------------+



我的想法是制作一个距离列表和一个站点及其索引字典,以制作一个矩阵,通过查看站点字典并定义索引范围来生成值,我们需要在其中总结距离。我用这种方法做了很多矩阵,但没有得到结果

import pandas as pd
file = open('/Users/miss_evgenia/Downloads/Caltrain Metrics - Sheet4.csv')
dist = pd.read_csv(file)
distances = list(dist['distance in m'])
#%%
names = list(dist['Departure Station'])
names.pop(0)
names= dict(zip(names, range(len(names))))
#%%
def sumRange(L,a,b):
    sum = 0
    for i in range(a,b+1,1):
        sum += L[i]
    return sum
这是我的字典和目录

{'San Francisco': 0, '22nd Street': 1, 'Bayshore': 2, 'South San Francisco': 3, 'San Bruno': 4, 'Millbrae Transit Center': 5, 'Broadway': 6, 'Burlingame': 7, 'San Mateo': 8, 'Hayward Park': 9, 'Hillsdale': 10, 'Belmont': 11, 'San Carlos': 12, 'Redwood City': 13, 'Atherton': 14, 'Menlo Park': 15, 'Palo Alto': 16, 'Stanford': 17, 'California Ave.': 18, 'San Antonio': 19, 'Mountain View': 20, 'Sunnyvale': 21, 'Lawrence': 22, 'Santa Clara': 23, 'College Park': 24, 'San Jose Diridon': 25, 'Tamien': 26, 'Capitol': 27, 'Blossom Hill': 28, 'Morgan Hill': 29, 'San Martin': 30, 'Gilroy': 31}

[0.0, 2521.949349, 5875.8986, 6690.161279, 2964.8535850000003, 4154.792069, 2549.171972, 1762.653178, 2307.847611, 2148.992125, 2597.932334, 2092.15, 1990.2395980000001, 3492.618122, 3847.6445320000003, 1752.92218, 2011.3823149999998, 1582.663905, 965.6060000000001, 3939.685111, 3108.414275, 4312.51742, 3189.943773, 5889.680131, 2252.4306100000003, 1872.8571949999998, 2887.967478, 4999.21158, 5304.202424, 19050.765359999998, 5917.5495, 10061.594720000001, 0.0]

救命啊!谢谢。

如果出发站和到达站的名称相同,也许您可以尝试以下方式:

cities = np.unique(distance_table["Departure Station"])
matrix = pd.DataFrame(columns = cities, index = cities)
for j in distance_table:
    matrix.at[distance_matrix.iloc[j,0],distance_matrix.iloc[j,1]] = distance_matrix.iloc[j,2]
其中距离表是您在问题中显示的距离表。也许您甚至可以使用.apply()来计算站点的“位置”,作为距离的
cumsum
,然后用于计算距离:

from scipy.spatial.distance import pdist, squareform

positions = data['distance in m'].cumsum()
matrix = squareform(pdist(positions.to_numpy()[:, None], 'euclidean'))

除了一个_guest,您还可以尝试以下操作,以将结果作为带有标签的pandas数据帧返回

def transform_dataframe():
    with open("test_data.csv", "r") as input_data:
        station_distances = pd.read_csv(input_data)
        # to stop gilroy appearing twice
        station_distances.drop(station_distances.tail(1).index,inplace=True)
    cumulative_distances = station_distances['distance in m'].cumsum()

    distance_matrix = cumulative_distances.values - cumulative_distances.values[:, None]
    distance_matrix = pd.DataFrame(distance_matrix, index=station_distances["Arrival Station"], columns=station_distances["Arrival Station"])
    return distance_matrix

我会把列表转换成从旧金山到每个站的累积距离,然后你知道两个站之间的距离是它们离旧金山的距离的差值。我现在就给你写下来。