使用R从GTFS数据创建iGraph图
我的目标是将GTFS停止和跳闸信息转换成一个图形,其中顶点是停止(来自GTFS的stops.txt),边是跳闸(来自GTFS的stop_times.txt)。第一步显而易见:使用R从GTFS数据创建iGraph图,r,graph,igraph,gtfs,R,Graph,Igraph,Gtfs,我的目标是将GTFS停止和跳闸信息转换成一个图形,其中顶点是停止(来自GTFS的stops.txt),边是跳闸(来自GTFS的stop_times.txt)。第一步显而易见: > library(igraph) #Reading in GTFS files > stops<-read.csv("stops.txt") > stop_times<-read.csv("stop_times.txt") 这意味着它包含在各个站点的到达和离开时间的stop\u id,而
> library(igraph)
#Reading in GTFS files
> stops<-read.csv("stops.txt")
> stop_times<-read.csv("stop_times.txt")
这意味着它包含在各个站点的到达和离开时间的stop\u id,而我希望获得每行的start\u stop\u id、end\u stop\u id、start\u time、end\u time(实际上,不是“stops”,而是从stops转换而来的“transits”)。但这种转换似乎对我来说很有挑战性,因为我应该在stop_时间内迭代行,并确定它们是否在相同的trip_id中,如果是,则计算开始-结束数据,如果不是,则插入NULL或找到另一个解决方案来分开trip。。。这让我很困惑
是否有任何优雅的方法将这两个数据帧组合成所需的图形?可以通过将值从下一行“移位”到“当前”行来生成“from”和“to”。停止信息可以简单地连接到 让我用一个例子来解释
库(data.table)
但是,请注意,在GTFS.zip
文件中,您可能有多种运输模式(火车、公共汽车、地铁等),并且由于服务频率的变化,某些站点对的连通性比其他站点高得多。我还不清楚在从GTFS.zip
构建图形时应该如何考虑这两点。未来的方法可能是根据每一条边的频率对其进行加权,并建立一个多层网络,在每个被视为相互依赖的层的传输模式中有一些共同的站点
>head(stop_times)
trip_id stop_id arrival_time departure_time stop_sequence shape_dist_traveled
1 A895151 F04272 06:20:00 06:20:00 10 0
2 A895151 F04184 06:22:00 06:22:00 20 648
3 A895151 F04319 06:24:00 06:24:00 30 1224
4 A895151 F04369 06:27:00 06:27:00 40 2779
5 A895151 008264 06:31:00 06:31:00 50 5620
6 A895151 F01520 06:33:00 06:33:00 60 6691
## here I"m using Melbourne's GTFS ("http://transitfeeds.com/p/ptv/497/latest/download")
#dt_stop_times <- lst[[6]]$stop_times
#dt_stops <- lst[[7]]$stops
#setDT(dt_stop_times)
#setDT(dt_stops)
## join on whatever stop information you want
dt_stop_times <- dt_stop_times[ dt_stops, on = c("stop_id"), nomatch = 0]
## set the order of stops for each group (in this case, each group is a trip_id)
setorder(dt_stop_times, trip_id, stop_sequence)
## create a new column by shifting the stop_id of the following row up
dt_stop_times[, stop_id_to := shift(stop_id, type = "lead"), by = .(trip_id)]
## you will have NAs at this point because the last stop doesn't go anywhere.
## you can do the same operation on multiple columns at the same time
dt_stop_times[, `:=`(stop_id_to = shift(stop_id, type = "lead"),
arrival_time_stop_to = shift(arrival_time, type = "lead"),
departure_time_stop_to = shift(departure_time, type = "lead")),
by = .(trip_id)]
## now you have your 'from' and 'to' columns from which you can make your igraph
## here's a subset of the result
dt_stop_times[, .(trip_id, stop_id, stop_name_from = stop_name, arrival_time, stop_id_to, arrival_time_stop_to)]
# trip_id stop_id stop_name_from arrival_time stop_id_to
# 1: 1.T0.3-86-A-mjp-1.7.R 4174 71-RMIT/Plenty Rd (Bundoora) 25:42:00 4485
# 2: 1.T0.3-86-A-mjp-1.7.R 4485 70-Janefield Dr/Plenty Rd (Bundoora) 25:43:00 4486
# 3: 1.T0.3-86-A-mjp-1.7.R 4486 69-Taunton Dr/Plenty Rd (Bundoora) 25:44:00 4487
# 4: 1.T0.3-86-A-mjp-1.7.R 4487 68-Greenhills Rd/Plenty Rd (Bundoora) 25:45:00 4488
# 5: 1.T0.3-86-A-mjp-1.7.R 4488 67-Bundoora Square SC/Plenty Rd (Bundoora) 25:46:00 4489
# ---
# 9415793: 9999.UQ.3-19-E-mjp-1.1.H 17871 7-Queen Victoria Market/Elizabeth St (Melbourne City) 23:25:00 17873
# 9415794: 9999.UQ.3-19-E-mjp-1.1.H 17873 5-Melbourne Central Station/Elizabeth St (Melbourne City) 23:27:00 17875
# 9415795: 9999.UQ.3-19-E-mjp-1.1.H 17875 3-Bourke Street Mall/Elizabeth St (Melbourne City) 23:30:00 17876
# 9415796: 9999.UQ.3-19-E-mjp-1.1.H 17876 2-Collins St/Elizabeth St (Melbourne City) 23:31:00 17877
# 9415797: 9999.UQ.3-19-E-mjp-1.1.H 17877 1-Flinders Street Railway Station/Elizabeth St (Melbourne City) 23:32:00 NA
# arrival_time_stop_to
# 1: 25:43:00
# 2: 25:44:00
# 3: 25:45:00
# 4: 25:46:00
# 5: 25:47:00
# ---
# 9415793: 23:27:00
# 9415794: 23:30:00
# 9415795: 23:31:00
# 9415796: 23:32:00
# 9415797: NA
# get a df with nodes
nodes <- dt_stops[, .(stop_id, stop_lon, stop_lat)]
# links beetween stops
links <- dt_stop_times[,.(stop_id, stop_id_to, trip_id)]
# create graph
g <- graph_from_data_frame(links , directed=TRUE, vertices=nodes)