使用R从GTFS数据创建iGraph图

使用R从GTFS数据创建iGraph图,r,graph,igraph,gtfs,R,Graph,Igraph,Gtfs,我的目标是将GTFS停止和跳闸信息转换成一个图形,其中顶点是停止(来自GTFS的stops.txt),边是跳闸(来自GTFS的stop_times.txt)。第一步显而易见: > library(igraph) #Reading in GTFS files > stops<-read.csv("stops.txt") > stop_times<-read.csv("stop_times.txt") 这意味着它包含在各个站点的到达和离开时间的stop\u id,而

我的目标是将GTFS停止和跳闸信息转换成一个图形,其中顶点是停止(来自GTFS的stops.txt),边是跳闸(来自GTFS的stop_times.txt)。第一步显而易见:

> library(igraph)

#Reading in GTFS files
> stops<-read.csv("stops.txt")
> stop_times<-read.csv("stop_times.txt")
这意味着它包含在各个站点的到达和离开时间的stop\u id,而我希望获得每行的start\u stop\u id、end\u stop\u id、start\u time、end\u time(实际上,不是“stops”,而是从stops转换而来的“transits”)。但这种转换似乎对我来说很有挑战性,因为我应该在stop_时间内迭代行,并确定它们是否在相同的trip_id中,如果是,则计算开始-结束数据,如果不是,则插入NULL或找到另一个解决方案来分开trip。。。这让我很困惑


是否有任何优雅的方法将这两个数据帧组合成所需的图形?

可以通过将值从下一行“移位”到“当前”行来生成“from”和“to”。停止信息可以简单地连接到

让我用一个例子来解释
库(data.table)

但是,请注意,在
GTFS.zip
文件中,您可能有多种运输模式(火车、公共汽车、地铁等),并且由于服务频率的变化,某些站点对的连通性比其他站点高得多。我还不清楚在从
GTFS.zip
构建图形时应该如何考虑这两点。未来的方法可能是根据每一条边的频率对其进行加权,并建立一个多层网络,在每个被视为相互依赖的层的传输模式中有一些共同的站点

>head(stop_times)
  trip_id stop_id arrival_time departure_time stop_sequence shape_dist_traveled
1 A895151  F04272     06:20:00       06:20:00            10                   0
2 A895151  F04184     06:22:00       06:22:00            20                 648
3 A895151  F04319     06:24:00       06:24:00            30                1224
4 A895151  F04369     06:27:00       06:27:00            40                2779
5 A895151  008264     06:31:00       06:31:00            50                5620
6 A895151  F01520     06:33:00       06:33:00            60                6691
## here I"m using Melbourne's GTFS ("http://transitfeeds.com/p/ptv/497/latest/download")

#dt_stop_times <- lst[[6]]$stop_times
#dt_stops <- lst[[7]]$stops

#setDT(dt_stop_times)
#setDT(dt_stops)


## join on whatever stop information you want
dt_stop_times <- dt_stop_times[ dt_stops, on = c("stop_id"), nomatch = 0]

## set the order of stops for each group (in this case, each group is a trip_id)
setorder(dt_stop_times, trip_id, stop_sequence)

## create a new column by shifting the stop_id of the following row up 
dt_stop_times[, stop_id_to := shift(stop_id, type = "lead"), by = .(trip_id)]

## you will have NAs at this point because the last stop doesn't go anywhere.

## you can do the same operation on multiple columns at the same time
dt_stop_times[, `:=`(stop_id_to = shift(stop_id, type = "lead"), 
                     arrival_time_stop_to = shift(arrival_time, type = "lead"),
                     departure_time_stop_to = shift(departure_time, type = "lead")),
              by = .(trip_id)]

## now you have your 'from' and 'to' columns from which you can make your igraph

## here's a subset of the result
dt_stop_times[, .(trip_id, stop_id, stop_name_from = stop_name, arrival_time, stop_id_to, arrival_time_stop_to)]

#                           trip_id stop_id                                                  stop_name_from arrival_time stop_id_to
# 1:          1.T0.3-86-A-mjp-1.7.R    4174                                    71-RMIT/Plenty Rd (Bundoora)     25:42:00       4485
# 2:          1.T0.3-86-A-mjp-1.7.R    4485                            70-Janefield Dr/Plenty Rd (Bundoora)     25:43:00       4486
# 3:          1.T0.3-86-A-mjp-1.7.R    4486                              69-Taunton Dr/Plenty Rd (Bundoora)     25:44:00       4487
# 4:          1.T0.3-86-A-mjp-1.7.R    4487                           68-Greenhills Rd/Plenty Rd (Bundoora)     25:45:00       4488
# 5:          1.T0.3-86-A-mjp-1.7.R    4488                      67-Bundoora Square SC/Plenty Rd (Bundoora)     25:46:00       4489
# ---                                                                                                                         
# 9415793: 9999.UQ.3-19-E-mjp-1.1.H   17871           7-Queen Victoria Market/Elizabeth St (Melbourne City)     23:25:00      17873
# 9415794: 9999.UQ.3-19-E-mjp-1.1.H   17873       5-Melbourne Central Station/Elizabeth St (Melbourne City)     23:27:00      17875
# 9415795: 9999.UQ.3-19-E-mjp-1.1.H   17875              3-Bourke Street Mall/Elizabeth St (Melbourne City)     23:30:00      17876
# 9415796: 9999.UQ.3-19-E-mjp-1.1.H   17876                      2-Collins St/Elizabeth St (Melbourne City)     23:31:00      17877
# 9415797: 9999.UQ.3-19-E-mjp-1.1.H   17877 1-Flinders Street Railway Station/Elizabeth St (Melbourne City)     23:32:00         NA
#          arrival_time_stop_to
# 1:                   25:43:00
# 2:                   25:44:00
# 3:                   25:45:00
# 4:                   25:46:00
# 5:                   25:47:00
# ---                     
# 9415793:             23:27:00
# 9415794:             23:30:00
# 9415795:             23:31:00
# 9415796:             23:32:00
# 9415797:                   NA
# get a df with nodes
  nodes <- dt_stops[, .(stop_id, stop_lon, stop_lat)]

# links beetween stops
  links <- dt_stop_times[,.(stop_id, stop_id_to, trip_id)]

# create graph
  g <- graph_from_data_frame(links , directed=TRUE, vertices=nodes)