R使用dplyr剪切包含2个或更多变量的固定时间间隔
我有一个数据帧R使用dplyr剪切包含2个或更多变量的固定时间间隔,r,dplyr,R,Dplyr,我有一个数据帧 df <- data.frame(time = c("2015-09-07 00:32:19", "2015-09-07 01:02:30", "2015-09-07 01:31:36", "2015-09-07 01:47:45", "2015-09-07 02:00:17", "2015-09-07 02:07:30", "2015-09-07 03:39:41", "2015-09-07 04:04:21", "2015-09-07 04:04:21", "2015-
df <- data.frame(time = c("2015-09-07 00:32:19", "2015-09-07 01:02:30", "2015-09-07 01:31:36", "2015-09-07 01:47:45",
"2015-09-07 02:00:17", "2015-09-07 02:07:30", "2015-09-07 03:39:41", "2015-09-07 04:04:21", "2015-09-07 04:04:21", "2015-09-07 04:04:22"),
inOut = c("IN", "OUT", "IN", "IN", "IN", "IN", "IN", "OUT", "IN", "OUT"))
> df
time inOut
1 2015-09-07 00:32:19 IN
2 2015-09-07 01:02:30 OUT
3 2015-09-07 01:31:36 IN
4 2015-09-07 01:47:45 IN
5 2015-09-07 02:00:17 IN
6 2015-09-07 02:07:30 IN
7 2015-09-07 03:39:41 IN
8 2015-09-07 04:04:21 OUT
9 2015-09-07 04:04:21 IN
10 2015-09-07 04:04:22 OUT
>
在创建示例数据集时,您将看到一个可以忽略的警告,或者只使用stringsAsFactors=F。
您还可以在过程中的某个时刻重命名列,并用更有用的内容替换Var1
在创建示例数据集时,您将看到一个可以忽略的警告,或者只使用stringsAsFactors=F。
您还可以在过程中的某个时刻重命名列,并用更有用的内容替换Var1。您可以重新调整表的形状以获得所需的格式
library(reshape2)
df2 <- df %>%
group_by(inOut,
timeCut= cut(as.POSIXct(time), breaks="15 min")) %>%
summarise(n = n()) %>%
dcast(timeCut ~ inOut, value.var = "n")
添加所有间隔
intervals <- data.frame(timeCut = levels(cut(as.POSIXct(df$time),
breaks="15 mins")))
df3 <- df2 %>%
mutate(timeCut = as.character(timeCut)) %>%
merge(intervals, all = TRUE)
如果需要,将NA值替换为0
df3[is.na(df3)] <- 0
> df3
timeCut IN OUT
1 2015-09-07 00:32:00 1 0
2 2015-09-07 00:47:00 0 0
3 2015-09-07 01:02:00 0 1
4 2015-09-07 01:17:00 1 0
5 2015-09-07 01:32:00 0 0
6 2015-09-07 01:47:00 2 0
7 2015-09-07 02:02:00 1 0
8 2015-09-07 02:17:00 0 0
9 2015-09-07 02:32:00 0 0
10 2015-09-07 02:47:00 0 0
11 2015-09-07 03:02:00 0 0
12 2015-09-07 03:17:00 0 0
13 2015-09-07 03:32:00 1 0
14 2015-09-07 03:47:00 0 0
15 2015-09-07 04:02:00 1 2
重塑2::dcast函数现在已被tidyr::spread替换,但我还不习惯它。有关数据准备的更多详细信息,请参阅。您可以重新调整表的形状以达到所需的格式
library(reshape2)
df2 <- df %>%
group_by(inOut,
timeCut= cut(as.POSIXct(time), breaks="15 min")) %>%
summarise(n = n()) %>%
dcast(timeCut ~ inOut, value.var = "n")
添加所有间隔
intervals <- data.frame(timeCut = levels(cut(as.POSIXct(df$time),
breaks="15 mins")))
df3 <- df2 %>%
mutate(timeCut = as.character(timeCut)) %>%
merge(intervals, all = TRUE)
如果需要,将NA值替换为0
df3[is.na(df3)] <- 0
> df3
timeCut IN OUT
1 2015-09-07 00:32:00 1 0
2 2015-09-07 00:47:00 0 0
3 2015-09-07 01:02:00 0 1
4 2015-09-07 01:17:00 1 0
5 2015-09-07 01:32:00 0 0
6 2015-09-07 01:47:00 2 0
7 2015-09-07 02:02:00 1 0
8 2015-09-07 02:17:00 0 0
9 2015-09-07 02:32:00 0 0
10 2015-09-07 02:47:00 0 0
11 2015-09-07 03:02:00 0 0
12 2015-09-07 03:17:00 0 0
13 2015-09-07 03:32:00 1 0
14 2015-09-07 03:47:00 0 0
15 2015-09-07 04:02:00 1 2
重塑2::dcast函数现在已被tidyr::spread替换,但我还不习惯它。有关数据准备的详细信息,请参阅。另一种使用dplyr和REGRAPE2的解决方案:
另一个使用dplyr和REGRAPE2的解决方案:
您的解决方案缺少时间间隔。谢谢Paul4forest,但根据Miha,此sol缺少2015-09-07 00:47:00 0,不管怎样,我现在对Antoniosk、Miha sol和您的提示已经足够清楚了。您的解决方案缺少时间间隔。谢谢Paul4forest,但根据Miha,此sol缺少2015-09-07 00:47:00 0 0,不管怎样,我现在对安东尼奥斯克、米哈·索尔和你的提示已经很清楚了。谢谢你的dplyr+tidyr sol谢谢你的dplyr+tidyr sol谢谢你的dplyr+REPLACE2 sol谢谢你的dplyr+REPLACE2 sol
library(dplyr)
library(reshape2)
my_levels <-
data_frame(timeCut = levels(cut(as.POSIXct(df$time), breaks="15 min")))
my_df <-
df %>%
mutate(timeCut = cut(as.POSIXct(time), breaks = "15 min")) %>%
mutate_each(funs(as.character)) %>%
right_join(., my_levels) %>%
select(-time) %>%
dcast(timeCut ~ inOut, length)
timeCut IN OUT NA
1 2015-09-07 00:32:00 1 0 0
2 2015-09-07 00:47:00 0 0 1
3 2015-09-07 01:02:00 0 1 0
4 2015-09-07 01:17:00 1 0 0
5 2015-09-07 01:32:00 0 0 1
6 2015-09-07 01:47:00 2 0 0
7 2015-09-07 02:02:00 1 0 0
8 2015-09-07 02:17:00 0 0 1
9 2015-09-07 02:32:00 0 0 1
10 2015-09-07 02:47:00 0 0 1
11 2015-09-07 03:02:00 0 0 1
12 2015-09-07 03:17:00 0 0 1
13 2015-09-07 03:32:00 1 0 0
14 2015-09-07 03:47:00 0 0 1
15 2015-09-07 04:02:00 1 2 0