组,然后创建一个';中断';如果datetime超过某个时间,则在原始分组列(R,dplyr)中创建一个新值
我有数据集,df组,然后创建一个';中断';如果datetime超过某个时间,则在原始分组列(R,dplyr)中创建一个新值,r,loops,dplyr,R,Loops,Dplyr,我有数据集,df Subject Folder Message Date A Out 9/9/2019 5:46:38 PM A Out 9/9/2019 5:46:40 PM A Out 9/9/2019 5:46:42 PM A Out
Subject Folder Message Date
A Out 9/9/2019 5:46:38 PM
A Out 9/9/2019 5:46:40 PM
A Out 9/9/2019 5:46:42 PM
A Out 9/9/2019 5:46:43 PM
A Out 9/9/2019 9:30:00 PM
A Out 9/9/2019 9:30:01 PM
B Out 9/9/2019 9:35:00 PM
B Out 9/9/2019 9:35:01 PM
我正在尝试按主题对此进行分组,查找持续时间,并创建一个新的持续时间列。我还希望创建一个阈值,如果日期时间超过一定的时间量。我的困境是,在A组中,时间从第四排的5:46到第五排的9:30。这导致A组的持续时间不准确。我希望“中断”该时间并找到新的持续时间,同时在时间超过10分钟时在受试者中创建新值(A1)。我不确定是否应该为此使用循环
Subject Duration Group
A 5 sec outdata1
A1 1 sec outdata2
B 1 sec outdata3
这是我的dput:
structure(list(Subject = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L), .Label = c("A", "B"), class = "factor"), Folder = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "Out", class = "factor"),
Message = c("", "", "", "", "", "", "", ""), Date = structure(1:8, .Label = c("9/9/2019 5:46:38 PM",
"9/9/2019 5:46:40 PM", "9/9/2019 5:46:42 PM", "9/9/2019 5:46:43 PM",
"9/9/2019 9:30:00 PM", "9/9/2019 9:30:01 PM", "9/9/2019 9:35:00 PM",
"9/9/2019 9:35:01 PM"), class = "factor")), row.names = c(NA,
-8L), class = "data.frame")
这就是我所尝试的:
thresh <- duration(10, units = "minutes")
df %>%
mutate(Date = mdy_hms(Date)) %>%
transmute(Subject, Duration = diff = difftime(as.POSIXct(Date, format =
"%m/%d/%Y %I:%M:%S %p"),as.POSIXct(Date,
format = "%m/%d/%Y %I:%M:%S %p" ), units = "secs")) %>%
ungroup %>%
distinct %>%
mutate(grp = str_c("Outdata", row_number()))
mutate(delta = if_else(grp < thresh1, grp, NA_real_))
thresh%
突变(日期=mdy_hms(日期))%>%
转换(主题,持续时间=diff=difftime)(如.POSIXct(日期,格式=
%m/%d/%Y%I:%m:%S%p),作为.POSIXct(日期,
format=“%m/%d/%Y%I:%m:%S%p”),units=“secs”)%%>%
解组%>%
不同%>%
突变(grp=str_c(“Outdata”,row_number()))
变异(delta=if_-else(grp
我们可以计算连续的日期
值之间的持续时间来创建新的组,然后计算每个组中min
和max
之间的时间差
library(dplyr)
thresh <- 10
df %>%
mutate(Date = as.POSIXct(Date, format = "%m/%d/%Y %I:%M:%S %p")) %>%
group_by(Subject, Group = cumsum(difftime(Date,
lag(Date, default = first(Date)), units = "mins") > thresh)) %>%
summarise(Duration = difftime(max(Date), min(Date), units = "secs")) %>%
ungroup %>%
mutate(Group = paste0('outdata', row_number()))
# A tibble: 3 x 3
# Subject Group Duration
# <fct> <chr> <drtn>
#1 A outdata1 5 secs
#2 A outdata2 1 secs
#3 B outdata3 1 secs
库(dplyr)
脱粒率%
突变(日期=as.POSIXct(日期,格式=“%m/%d/%Y%I:%m:%S%p”))%>%
分组依据(受试者,分组=总和(difftime)(日期,
滞后(日期,默认值=第一个(日期)),单位=“分钟”)>thresh))%>%
总结(持续时间=差异时间(最大(日期)、最小(日期)、单位=“秒”))%>%
解组%>%
mutate(Group=paste0('outdata',row_number()))
#一个tibble:3x3
#主题组持续时间
#
#1 A输出数据1 5秒
#2 A输出数据2 1秒
#3 B输出数据3 1秒
ok我已经尝试并得到了这个错误:as.POSIXlt.character(as.character(x),…)中的错误:字符串不是标准的明确格式日期时间如上所示。我现在正在查找奇怪的错误。如果我对您与dput
共享的数据使用上述代码,它将提供准确的输出。无论如何,您可以更改第二行并使用您正在使用的,mutate(Date=mdy\u hms(Date))