R:如何使用交通智能卡数据将行程段组合到行程中

R:如何使用交通智能卡数据将行程段组合到行程中,r,transport,R,Transport,目前正在使用一个有趣的交通智能卡数据集。当前数据中的每条线路代表一次跳闸(例如,从a到B的母线跳闸)。60分钟内的任何行程都需要分组为行程 当前表格: CustomerID SegmentID OriginLocation Destination Start Time End Time Fare TypeOfTransport .. .. .. A001 101 LocationA LocationB 7:30am

目前正在使用一个有趣的交通智能卡数据集。当前数据中的每条线路代表一次跳闸(例如,从a到B的母线跳闸)。60分钟内的任何行程都需要分组为行程

当前表格:

CustomerID  SegmentID   OriginLocation  Destination Start Time  End Time    Fare    TypeOfTransport ..  ..  ..
A001        101         LocationA   LocationB       7:30am      7:45am      1.5        Bus          
A001        102         LocationB   LocationC       7:50am      8:30am      3.5        Train            
A001        103         LocationC   LocationB       17:10pm     18:00pm     3.5        Train            
A001        104         LocationB   LocationA       18:10pm     18:30pm     1.5        Bus          
A002        105         LocationK   LocationY       11:30am     12:30pm     3          Train            
A003        106         LocationP   LocationO       10:23am     11:13am     4          Ferrie           
并隐藏成这样的东西:

CustomerID  JourneyID   OriginLocation  Destination Start Time  End Time    Fare    TypeOfTransport    NumOfTrips
A001        1           LocationA       LocationC   7:30am      8:30am      5       Intermodal        2
A001        2           LocationC       LocationA   17:10pm     18:30pm     5       Intermodal        2
A002        6           LocationK       LocationY   11:30am     12:30pm     3       Train             1
A003        8           LocationP       LocationO   10:23am     11:13am     4       Ferrie            1

我是R新手,不知道如何开始,因此希望您能提供任何指导。

这里是您示例数据的起点。你的真实数据可能会让事情变得更复杂

library(dplyr)

## prepare data
df<- read.table(header=T, stringsAsFactors = FALSE, text="
CustomerID  SegmentID   OriginLocation  Destination StartTime  EndTime    Fare    TypeOfTransport
A001        101         LocationA   LocationB       7:30am      7:45am      1.5        Bus
A001        102         LocationB   LocationC       7:50am      8:30am      3.5        Train
A001        103         LocationC   LocationB       17:10pm     18:00pm     3.5        Train
A001        104         LocationB   LocationA       18:10pm     18:30pm     1.5        Bus
A002        105         LocationK   LocationY       11:30am     12:30pm     3          Train
A003        106         LocationP   LocationO       10:23am     11:13am     4          Ferrie")
df$StartTime <- as.POSIXct(df$StartTime, format="%H:%M%p")
df$EndTime <- as.POSIXct(df$EndTime, format="%H:%M%p")


## Get group for each observation according to start-/endtimes + gap (60secs+60mins)
getGrp <- function(EndTimes, StartTimes, maxgap = 60*60) {
  vbetween <- Vectorize(dplyr::between, c("left", "right"))
  none <- Negate(any)
  if (length(EndTimes) > 1) {
    mat <-   vbetween(EndTimes, StartTimes-maxgap, StartTimes)
    idx <- apply(mat, 1, none, na.rm = T)
    grp <- cumsum(c(TRUE, head(idx, -1)))
  } else {
    grp <- 1L
  }
  return(grp)
}

## Summarise
df %>% 
  group_by(CustomerID) %>% 
  mutate(grp = getGrp(EndTime, StartTime)) %>%
  group_by(grp, add = TRUE) %>%
  summarise(Fare = sum(Fare), 
            NumOfTrips = n(), 
            StartTime = format(min(StartTime), "%H:%M"), 
            EndTime = format(max(EndTime), "%H:%M"), 
            OriginLocation = OriginLocation[row_number(1)],
            Destination = tail(Destination, 1),
            TypeOfTransport = list(TypeOfTransport)) %>%
  select(-grp) 

## Result
# CustomerID  Fare NumOfTrips StartTime EndTime OriginLocation Destination TypeOfTransport
# (chr) (dbl)      (int)     (chr)   (chr)          (chr)       (chr)           (chr)
# 1       A001     5          2     07:30   08:30      LocationA   LocationC        <chr[2]>
# 2       A001     5          2     17:10   18:30      LocationC   LocationA        <chr[2]>
# 3       A002     3          1     11:30   12:30      LocationK   LocationY        <chr[1]>
# 4       A003     4          1     10:23   11:13      LocationP   LocationO        <chr[1]>
库(dplyr)
##准备数据

dfSide注意:我注意到您过去没有关闭任何问题。如果有一个解决了您的问题,您应该单击它旁边的复选标记。非常感谢!我正在运行RStudio,代码的摘要部分返回以下错误:“error in UseMethod(“group_by_u”):没有适用于“group_by_u”的方法应用于类的对象”函数“不客气。示例代码在这里运行良好-我无法重现您的错误。我的代码中有一个输入错误,现在一切都运行良好!再次感谢,