Python 熊猫在时刻表中填写遗漏的站点_Python_Pandas_Merge_Missing Data_Timetable

Python 熊猫在时刻表中填写遗漏的站点

python pandas merge

Python 熊猫在时刻表中填写遗漏的站点,python,pandas,merge,missing-data,timetable,Python,Pandas,Merge,Missing Data,Timetable,我有两个不同的数据帧：第一个数据帧存储一些可能的列车连接（如时刻表）：第二个数据帧是对实际列车停站的测量： index start stop passengers 0 a b 2 1 b d 4 2 a c 1 3 c d 2 4 g j 5 有时火车不停在车站。我试图达到的目的是填补缺失的站点，并继续跟踪乘客测量： index route start stop passenger

我有两个不同的数据帧：
第一个数据帧存储一些可能的列车连接（如时刻表）：

第二个数据帧是对实际列车停站的测量：

index start stop passengers
0     a     b    2
1     b     d    4
2     a     c    1
3     c     d    2
4     g     j    5

有时火车不停在车站。我试图达到的目的是填补缺失的站点，并继续跟踪乘客测量：

index route start stop passengers
0     1     a     b    2
1     1     b     c    4
2     1     c     d    4
3     1     a     b    1
4     1     b     c    1
5     1     c     d    2
6     2     g     h    5
7     2     h     i    5
8     2     i     j    5

因此，我只想填满所有跳过的站点

正如温家宝所指出的，熊猫可能不是表示这些数据的最佳选择。如果您想使用Pandas，我建议从df中的“连接站”（下一行=下一站，除非是不同的路线/使用字母定义顺序）切换到数字标识符，并将路线、名称等保留在不同的列中。如果您使用数字标识符，这里有一个可能的实现，可以将乘客相加。不同的路线通过100+站号或200+站号区分：

table = pd.DataFrame({'route':['g','g','g','g','r','r','r'],'start':[101,102,103,104,201,202,203],
                  'stop':[102,103,104,105,202,203,204],'count':[0,0,0,0,0,0,0]})
passenger = pd.DataFrame({'start':[101,102,202],'stop':[104,103,204],
                         'passenger':[2,5,3]})

count = list(zip(passenger.start.tolist(),passenger.stop.tolist(),passenger.passenger.tolist())) #merge the start, stop and count into one list for each entry
for c in count:
    for x in range(c[0],c[1]+1): #go through each stop and add the count to the train table
        table['count'] = np.where(table.start == x, table['count'] + c[2], table['count'])
table #Now with the passenger data

这更像是网络问题

table = pd.DataFrame({'route':['g','g','g','g','r','r','r'],'start':[101,102,103,104,201,202,203],
                  'stop':[102,103,104,105,202,203,204],'count':[0,0,0,0,0,0,0]})
passenger = pd.DataFrame({'start':[101,102,202],'stop':[104,103,204],
                         'passenger':[2,5,3]})

count = list(zip(passenger.start.tolist(),passenger.stop.tolist(),passenger.passenger.tolist())) #merge the start, stop and count into one list for each entry
for c in count:
    for x in range(c[0],c[1]+1): #go through each stop and add the count to the train table
        table['count'] = np.where(table.start == x, table['count'] + c[2], table['count'])
table #Now with the passenger data