R 在一段时间内累计,计数唯一

R 在一段时间内累计,计数唯一,r,time,dplyr,xts,R,Time,Dplyr,Xts,如何按15分钟周期(时钟时间)累积秒数和每个loc的唯一id数聚合这些数据 此示例的输出应如下所示: > dput(df.out) structure(list(unique.id = c(3, 7, 2, 2, 4), loc = c("A", "A", "A", "B", "B"), time = structure(c(1425172501, 1425173400, 1425174300, 1425321900, 1425322800), class = c("POSIXct",

如何按15分钟周期(时钟时间)累积秒数和每个loc的唯一id数聚合这些数据

此示例的输出应如下所示:

> dput(df.out)
structure(list(unique.id = c(3, 7, 2, 2, 4), loc = c("A", "A", 
"A", "B", "B"), time = structure(c(1425172501, 1425173400, 1425174300, 
1425321900, 1425322800), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    secs = c(318, 380, 6, 43, 138)), class = c("tbl_df", "tbl", 
"data.frame"), row.names = c(NA, -5L), .Names = c("unique.id", 
"loc", "time", "secs"))
我成功地使用了xts包来计算秒数:

## disregarding the loc grouping:
df.test <- select(df, time, secs)
df.test <- na.omit(df.test) ##xts with period.sum does not like NA
df.test <- as.xts(df.test, order.by = df.test$time)
df.test <- period.sum(df.test$secs, endpoints(df.test , "mins", k=15))
df.test <- align.time(df.test , 15*60)
##忽略loc分组:

df.test这里有一个使用dplyr的解决方案。将时间转换为15分钟间隔,然后进行分组/总结

df$time<- as.POSIXct(ceiling(as.double(df$time) / (15*60)) * (15*60),
                         origin = '1970-01-01')
df %>%
  group_by(time, loc) %>%
  summarise(unique.id = n_distinct(id), secs = sum(secs)) %>%
  select(unique.id, loc, time, secs)
df$time%
分组依据(时间,loc)%>%
摘要(unique.id=n_distinct(id),secs=sum(secs))%>%
选择(唯一id、loc、时间、秒)
输出为:

Source: local data frame [5 x 4]
Groups: time [5]

  unique.id    loc                time  secs
      <int> <fctr>              <dttm> <dbl>
1         3      A 2015-03-01 03:15:00   318
2         7      A 2015-03-01 03:30:00   380
3         2      A 2015-03-01 03:45:00     6
4         2      B 2015-03-02 20:45:00    43
5         4      B 2015-03-02 21:00:00   138
来源:本地数据帧[5 x 4]
分组:时间[5]
唯一id loc时间秒
13A 2015-03-01 03:15:00 318
27A 2015-03-01 03:30:00 380
32015-03-0103:45:006
42B 2015-03-02 20:45:00 43
54B 2015-03-02 21:00:00 138

这里有一个使用dplyr的解决方案。将时间转换为15分钟间隔,然后进行分组/总结

df$time<- as.POSIXct(ceiling(as.double(df$time) / (15*60)) * (15*60),
                         origin = '1970-01-01')
df %>%
  group_by(time, loc) %>%
  summarise(unique.id = n_distinct(id), secs = sum(secs)) %>%
  select(unique.id, loc, time, secs)
df$time%
分组依据(时间,loc)%>%
摘要(unique.id=n_distinct(id),secs=sum(secs))%>%
选择(唯一id、loc、时间、秒)
输出为:

Source: local data frame [5 x 4]
Groups: time [5]

  unique.id    loc                time  secs
      <int> <fctr>              <dttm> <dbl>
1         3      A 2015-03-01 03:15:00   318
2         7      A 2015-03-01 03:30:00   380
3         2      A 2015-03-01 03:45:00     6
4         2      B 2015-03-02 20:45:00    43
5         4      B 2015-03-02 21:00:00   138
来源:本地数据帧[5 x 4]
分组:时间[5]
唯一id loc时间秒
13A 2015-03-01 03:15:00 318
27A 2015-03-01 03:30:00 380
32015-03-0103:45:006
42B 2015-03-02 20:45:00 43
54B 2015-03-02 21:00:00 138

这太简单了!我真不敢相信我能拿着天花板。。。谢谢这太简单了!我真不敢相信我能拿着天花板。。。谢谢