R使用data.table剪切包含2个或更多变量的固定时间间隔

R使用data.table剪切包含2个或更多变量的固定时间间隔,r,data.table,R,Data.table,我有一个数据帧 df <- data.frame(time = c("2015-09-07 00:32:19", "2015-09-07 01:02:30", "2015-09-07 01:31:36", "2015-09-07 01:47:45", "2015-09-07 02:00:17", "2015-09-07 02:07:30", "2015-09-07 03:39:41", "2015-09-07 04:04:21", "2015-09-07 04:04:21", "2015-

我有一个数据帧

df <- data.frame(time = c("2015-09-07 00:32:19", "2015-09-07 01:02:30", "2015-09-07 01:31:36", "2015-09-07 01:47:45",
"2015-09-07 02:00:17", "2015-09-07 02:07:30", "2015-09-07 03:39:41", "2015-09-07 04:04:21", "2015-09-07 04:04:21", "2015-09-07 04:04:22"), 
inOut = c("IN", "OUT", "IN", "IN", "IN", "IN", "IN", "OUT", "IN", "OUT")) 

> df
                  time inOut
1  2015-09-07 00:32:19    IN
2  2015-09-07 01:02:30   OUT
3  2015-09-07 01:31:36    IN
4  2015-09-07 01:47:45    IN
5  2015-09-07 02:00:17    IN
6  2015-09-07 02:07:30    IN
7  2015-09-07 03:39:41    IN
8  2015-09-07 04:04:21   OUT
9  2015-09-07 04:04:21    IN
10 2015-09-07 04:04:22   OUT
> 
df
时间输入
1 2015-09-07 00:32:19英寸
2 2015-09-07 01:02:30外出
3 2015-09-07 01:31:36英寸
4 2015-09-07 01:47:45英寸
5 2015-09-07 02:00:17英寸
6 2015-09-07 02:07:30英寸
7 2015-09-07 03:39:41英寸
8 2015-09-07 04:04:21外出
9 2015-09-07 04:04:21英寸
10 2015-09-07 04:04:22外出
> 
我想计算每15分钟输入/输出的计数数, 我可以通过创建另一个in_df,out_df,每15分钟剪切这些数据帧,然后将这些数据帧合并在一起以获得我的结果。资发基金是我预期的结果

in_df <- df[which(df$inOut== "IN"),]
out_df <- df[which(df$inOut== "OUT"),]

a <- data.frame(table(cut(as.POSIXct(in_df$time), breaks="15 mins")))
b <- data.frame(table(cut(as.POSIXct(out_df$time), breaks="15 mins")))
colnames(b) <- c("Time", "Out")
colnames(a) <- c("Time", "In")

outdf <- merge(a,b, all=TRUE)
outdf[is.na(outdf)] <- 0

> outdf
                  Time In Out
1  2015-09-07 00:32:00  1   0
2  2015-09-07 00:47:00  0   0
3  2015-09-07 01:02:00  0   1
4  2015-09-07 01:17:00  1   0
5  2015-09-07 01:32:00  0   0
6  2015-09-07 01:47:00  2   0
7  2015-09-07 02:02:00  1   0
8  2015-09-07 02:17:00  0   0
9  2015-09-07 02:32:00  0   0
10 2015-09-07 02:47:00  0   0
11 2015-09-07 03:02:00  0   0
12 2015-09-07 03:17:00  0   0
13 2015-09-07 03:32:00  1   0
14 2015-09-07 03:47:00  0   0
15 2015-09-07 04:02:00  1   2

在_df的data.table中,我会这样做

library(data.table)
setDT(df)

df[, timeCut := cut(as.POSIXct(time), breaks="15 mins")]

df[J(timeCut = levels(timeCut)), 
   as.list(table(inOut)), 
   on = "timeCut", 
   by = .EACHI]
其中:

                timeCut IN OUT
 1: 2015-09-07 00:32:00  1   0
 2: 2015-09-07 00:47:00  0   0
 3: 2015-09-07 01:02:00  0   1
 4: 2015-09-07 01:17:00  1   0
 5: 2015-09-07 01:32:00  0   0
 6: 2015-09-07 01:47:00  2   0
 7: 2015-09-07 02:02:00  1   0
 8: 2015-09-07 02:17:00  0   0
 9: 2015-09-07 02:32:00  0   0
10: 2015-09-07 02:47:00  0   0
11: 2015-09-07 03:02:00  0   0
12: 2015-09-07 03:17:00  0   0
13: 2015-09-07 03:32:00  1   0
14: 2015-09-07 03:47:00  0   0
15: 2015-09-07 04:02:00  1   2
解释最后一部分类似于DT[i=J(x=my_x),J,on=“x”,by=.EACHI]
,可以理解为:

  • 加入
    DT
    x
    on
    my\u x
  • 然后对由
    my_x
    确定的每个子集执行
    j

  • 在这种情况下,
    j=as.list(表(inOut))
    。必须将该表强制为一个列表,以创建多个列(每层
    inOut

    很好地处理了
    。EACHI
    @Frank,谢谢,你的数据。表sol非常好而且清晰,我将此标记为答案,并为“dplyr”sol创建另一个问题。@JamesChen好的,很公平。我也很想看看人们对此有什么想法。我不知道dplyr如何从
    结果中创建多列。@弗兰克,你可以在这个链接中看到dplyr的答案,谢谢