Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/70.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/unity3d/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何使用R中的spread()和gather()函数将给定的预定行程数据集重建为所需的链接行程数据集?_R_Tidyverse_Tidyr_Sf_Spread - Fatal编程技术网

如何使用R中的spread()和gather()函数将给定的预定行程数据集重建为所需的链接行程数据集?

如何使用R中的spread()和gather()函数将给定的预定行程数据集重建为所需的链接行程数据集?,r,tidyverse,tidyr,sf,spread,R,Tidyverse,Tidyr,Sf,Spread,我有一个预订的旅行数据集,如下所示: bktrips <- data.frame( userID =c("P001", "P001", "P001", "P001", "P001", "P002", "P002", "P002", "P002"), mode = c("bus", "train", "taxi", "bus", "train", "taxi","bus", "train", "taxi"), Origin = c("O1", "O2", "O3", "O4"

我有一个预订的旅行数据集,如下所示:

bktrips <- data.frame(
  userID =c("P001", "P001", "P001", "P001", "P001", "P002", "P002", "P002", "P002"), 
  mode = c("bus", "train", "taxi", "bus", "train", "taxi","bus", "train", "taxi"), 
  Origin = c("O1", "O2", "O3", "O4", "O5", "O6", "O7", "O8", "O9"), 
  Destination = c("D1", "D2", "D3", "D4", "D5", "D6", "D7","D8", "D9" ), 
  depart_dt = c("2019-11-05 8:00:00","2019-11-05 8:30:00", "2019-11-05 11:00:00", "2019-11-05 11:40:00", "2019-11-06 8:00:00", "2019-11-06 9:10:00", "2019-11-07 8:00:00", "2019-11-08 8:00:00", "2019-11-08 8:50:00"), 
  Olat = c("-33.87085", "-33.87138", "-33.79504", "-33.87832", "-33.89158", "-33.88993", "-33.89173", "-33.88573", "-33.88505"), 
  Olon = c("151.2073", "151.2039", "151.2737", "151.2174","151.2485", "151.2805","151.2469", "151.2169","151.2156"), 
  Dlat = c("-33.87372", "-33.87384", "-33.88323", "-33.89165", "-33.88993", "-33.89177", "-33.88573", "-33.87731", "-33.88573"), 
  Dlon = c("151.1957", "151.2126", "151.2175", "151.2471","151.2471", "151.2805","151.2514", "151.2175","151.2169")
)

bktrips这里是一种使用
dplyr
geosphere
计算距离的方法。我使用
lubridate
来确定您的日期列

首先,我们修复列的类。其次,我们依赖这样一个事实,即旅行必须按时间顺序进行。因此,我们使用
lag
from
dplyr
distHaversine
from
geosphere
计算距离上一个目的地的距离,以及自上次出发以来的时间

library(dplyr)
library(geosphere)
library(lubridate)
bktrips %>%
  mutate(depart_dt = ymd_hms(depart_dt)) %>%
  mutate_at(vars(contains(c("lat","lon"))),list(~as.numeric(as.character(.)))) %>%
  group_by(userID) %>% 
  arrange(depart_dt,.by_group = TRUE) %>%
  mutate(DistPrevDest = distHaversine(cbind(Olon,Olat),cbind(lag(Dlon),lag(Dlat))),
         TimePrevDep = difftime(depart_dt,lag(depart_dt))) %>%
  dplyr::select(-depart_dt,-contains(c("lat","lon")))
  userID mode  Origin Destination DistPrevDest TimePrevDep
  <fct>  <fct> <fct>  <fct>              <dbl> <drtn>     
1 P001   bus   O1     D1                   NA    NA mins  
2 P001   train O2     D2                  801.   30 mins  
3 P001   taxi  O3     D3                10434.  150 mins  
4 P001   bus   O4     D4                  547.   40 mins  
5 P001   train O5     D5                  130. 1220 mins  
6 P002   taxi  O6     D6                   NA    NA mins  
7 P002   bus   O7     D7                 3105. 1370 mins  
8 P002   train O8     D8                 3188. 1440 mins  
9 P002   taxi  O9     D9                  879.   50 mins  
我建议您在数据中也包括到达时间,而是计算出发时间和前一次到达时间之间的差异

编辑
缺少一个
cumsum()
。现在修好了。另外,不再需要
rleid

我不清楚你想用它去哪里,但这里是计算每组行程距离和时间的开始(通过用户ID)。我必须快速找到一个软件包来计算从经纬度到地球的距离,然后找到了
地球圈
。 希望这有帮助

library(dplyr)
library(tibble)
library(geosphere)

bktrips <- tibble(
  userID =c("P001", "P001", "P001", "P001", "P001", "P002", "P002", "P002", "P002"), 
  mode = c("bus", "train", "taxi", "bus", "train", "taxi","bus", "train", "taxi"), 
  Origin = c("O1", "O2", "O3", "O4", "O5", "O6", "O7", "O8", "O9"), 
  Destination = c("D1", "D2", "D3", "D4", "D5", "D6", "D7","D8", "D9" ), 
  depart_dt = c("2019-11-05 8:00:00","2019-11-05 8:30:00", "2019-11-05 11:00:00", "2019-11-05 11:40:00", "2019-11-06 8:00:00", "2019-11-06 9:10:00", "2019-11-07 8:00:00", "2019-11-08 8:00:00", "2019-11-08 8:50:00"), 
  Olat = c("-33.87085", "-33.87138", "-33.79504", "-33.87832", "-33.89158", "-33.88993", "-33.89173", "-33.88573", "-33.88505"), 
  Olon = c("151.2073", "151.2039", "151.2737", "151.2174","151.2485", "151.2805","151.2469", "151.2169","151.2156"), 
  Dlat = c("-33.87372", "-33.87384", "-33.88323", "-33.89165", "-33.88993", "-33.89177", "-33.88573", "-33.87731", "-33.88573"), 
  Dlon = c("151.1957", "151.2126", "151.2175", "151.2471","151.2471", "151.2805","151.2514", "151.2175","151.2169")
)

bktrips <- bktrips %>%
  mutate(depart_dt = as.POSIXct(depart_dt, format = "%Y-%m-%d %H:%M:%S"),
         Olat = as.numeric(Olat),
         Olon = as.numeric(Olon),
         Dlat = as.numeric(Dlat),
         Dlon = as.numeric(Dlon)) %>%
  group_by(userID) %>%
  mutate(trip_time = as.numeric(depart_dt - lag(depart_dt), units = 'mins')) %>%
  rowwise() %>%
  mutate(trip_distance = distm(x = c(Olon, Olat), y = c(Dlon, Dlat), fun = distHaversine))

> bktrips
Source: local data frame [9 x 11]
Groups: <by row>

# A tibble: 9 x 11
  userID mode  Origin Destination depart_dt            Olat  Olon  Dlat  Dlon trip_time trip_distance
  <chr>  <chr> <chr>  <chr>       <dttm>              <dbl> <dbl> <dbl> <dbl>     <dbl>         <dbl>
1 P001   bus   O1     D1          2019-11-05 08:00:00 -33.9  151. -33.9  151.        NA         1119.
2 P001   train O2     D2          2019-11-05 08:30:00 -33.9  151. -33.9  151.        30          849.
3 P001   taxi  O3     D3          2019-11-05 11:00:00 -33.8  151. -33.9  151.       150        11108.
4 P001   bus   O4     D4          2019-11-05 11:40:00 -33.9  151. -33.9  151.        40         3120.
5 P001   train O5     D5          2019-11-06 08:00:00 -33.9  151. -33.9  151.      1220          225.
6 P002   taxi  O6     D6          2019-11-06 09:10:00 -33.9  151. -33.9  151.        NA          205.
7 P002   bus   O7     D7          2019-11-07 08:00:00 -33.9  151. -33.9  151.      1370          787.
8 P002   train O8     D8          2019-11-08 08:00:00 -33.9  151. -33.9  151.      1440          939.
9 P002   taxi  O9     D9          2019-11-08 08:50:00 -33.9  151. -33.9  151.        50          142.
库(dplyr)
图书馆(tibble)
图书馆(地球圈)
bktrips%
分组人(用户ID)%>%
变异(行程时间=as.numeric(出发时间-lag(出发时间),单位='mins'))%>%
行()
变异(trip_distance=distm(x=c(Olon,Olat),y=c(Dlon,Dlat),fun=distHaversine))
>bktrips
来源:本地数据帧[9 x 11]
组:
#一个tibble:9x11
用户识别码模式始发地目的地出发地点到达时间行程距离
1 P001总线O1 D1 2019-11-05 08:00:00-33.9 151-33.9  151.        NA 1119。
2 P001列车O2 D2 2019-11-05 08:30:00-33.9 151-33.9  151.        30          849.
3 P001出租车O3 D32019-11-05 11:00:00-33.8 151-33.9  151.       150        11108.
4 P001总线O4 D4 2019-11-05 11:40:00-33.9 151-33.9  151.        40         3120.
5 P001列车O5 D5 2019-11-06 08:00:00-33.9 151-33.9  151.      1220          225.
6 P002出租车O6 D6 2019-11-06 09:10:00-33.9 151-33.9  151.        NA 205。
7 P002巴士O7 D7 2019-11-07 08:00:00-33.9 151-33.9  151.      1370          787.
8 P002列车O8 D8 2019-11-08 08:00:00-33.9 151-33.9  151.      1440          939.
9 P002出租车O9 D9 2019-11-08 08:50:00-33.9 151-33.9  151.        50          142.

感谢Ben的精彩编辑。你能帮我解决这个问题吗?上次旅行的终点(到达日期/时间)在哪里?谢谢你,爱德华。根据我的实际数据,大部分到达时间都没有了。亲爱的伊恩,非常感谢你的出色工作。你在这里做的工作是正确的,这符合我对这个问题的期望。此外,在我的实际数据集中,大部分到达时间都丢失了。这就是为什么我只需要在出发时间工作。很多爱。很高兴它为你工作!在一点点反馈中,我花了很长时间与
distHaversine
的错误结果作斗争,因为lat和long值是因子,并且被错误地强制为整数。以后,请尝试使用
dput(bktrips)
提供列已经是正确类的示例数据。非常感谢Paul的建议和支持。
library(dplyr)
library(tibble)
library(geosphere)

bktrips <- tibble(
  userID =c("P001", "P001", "P001", "P001", "P001", "P002", "P002", "P002", "P002"), 
  mode = c("bus", "train", "taxi", "bus", "train", "taxi","bus", "train", "taxi"), 
  Origin = c("O1", "O2", "O3", "O4", "O5", "O6", "O7", "O8", "O9"), 
  Destination = c("D1", "D2", "D3", "D4", "D5", "D6", "D7","D8", "D9" ), 
  depart_dt = c("2019-11-05 8:00:00","2019-11-05 8:30:00", "2019-11-05 11:00:00", "2019-11-05 11:40:00", "2019-11-06 8:00:00", "2019-11-06 9:10:00", "2019-11-07 8:00:00", "2019-11-08 8:00:00", "2019-11-08 8:50:00"), 
  Olat = c("-33.87085", "-33.87138", "-33.79504", "-33.87832", "-33.89158", "-33.88993", "-33.89173", "-33.88573", "-33.88505"), 
  Olon = c("151.2073", "151.2039", "151.2737", "151.2174","151.2485", "151.2805","151.2469", "151.2169","151.2156"), 
  Dlat = c("-33.87372", "-33.87384", "-33.88323", "-33.89165", "-33.88993", "-33.89177", "-33.88573", "-33.87731", "-33.88573"), 
  Dlon = c("151.1957", "151.2126", "151.2175", "151.2471","151.2471", "151.2805","151.2514", "151.2175","151.2169")
)

bktrips <- bktrips %>%
  mutate(depart_dt = as.POSIXct(depart_dt, format = "%Y-%m-%d %H:%M:%S"),
         Olat = as.numeric(Olat),
         Olon = as.numeric(Olon),
         Dlat = as.numeric(Dlat),
         Dlon = as.numeric(Dlon)) %>%
  group_by(userID) %>%
  mutate(trip_time = as.numeric(depart_dt - lag(depart_dt), units = 'mins')) %>%
  rowwise() %>%
  mutate(trip_distance = distm(x = c(Olon, Olat), y = c(Dlon, Dlat), fun = distHaversine))

> bktrips
Source: local data frame [9 x 11]
Groups: <by row>

# A tibble: 9 x 11
  userID mode  Origin Destination depart_dt            Olat  Olon  Dlat  Dlon trip_time trip_distance
  <chr>  <chr> <chr>  <chr>       <dttm>              <dbl> <dbl> <dbl> <dbl>     <dbl>         <dbl>
1 P001   bus   O1     D1          2019-11-05 08:00:00 -33.9  151. -33.9  151.        NA         1119.
2 P001   train O2     D2          2019-11-05 08:30:00 -33.9  151. -33.9  151.        30          849.
3 P001   taxi  O3     D3          2019-11-05 11:00:00 -33.8  151. -33.9  151.       150        11108.
4 P001   bus   O4     D4          2019-11-05 11:40:00 -33.9  151. -33.9  151.        40         3120.
5 P001   train O5     D5          2019-11-06 08:00:00 -33.9  151. -33.9  151.      1220          225.
6 P002   taxi  O6     D6          2019-11-06 09:10:00 -33.9  151. -33.9  151.        NA          205.
7 P002   bus   O7     D7          2019-11-07 08:00:00 -33.9  151. -33.9  151.      1370          787.
8 P002   train O8     D8          2019-11-08 08:00:00 -33.9  151. -33.9  151.      1440          939.
9 P002   taxi  O9     D9          2019-11-08 08:50:00 -33.9  151. -33.9  151.        50          142.