R 选择列中处于给定时间段的元素

R 选择列中处于给定时间段的元素,r,dplyr,R,Dplyr,我想为相应节点选择30分钟范围内的活动,如果不在30分钟范围内,则删除这些活动 node <- c("ABC","ABC","ABCC","ABCC","ABCC","ABCC","ABCC","ABCC","ABCC","ABCC","ABCC","ABCC&q

我想为相应节点选择30分钟范围内的活动,如果不在30分钟范围内,则删除这些活动

node <- c("ABC","ABC","ABCC","ABCC","ABCC","ABCC","ABCC","ABCC","ABCC","ABCC","ABCC","ABCC")

activity <-c("LOSS_OF_MULTIPLEX_SECTION-OMS_A","LOSS_OF_MULTIPLEX_SECTION-OMS_A","NODE_ISOLATION","NE_NOT_REACH_VIA_PRIMARY_MNG_INTERF","NODE_ISOLATION","LOSS_OF_MULTIPLEX_SECTION-OMS_A","NODE_ISOLATION","NE_NOT_REACH_VIA_PRIMARY_MNG_INTERF","NE_NOT_REACH_VIA_PRIMARY_MNG_INTERF", "UNDERLYING_RESOURCE_UNAVAILABLE-OMS_A","UNDERLYING_RESOURCE_UNAVAILABLE-OMS_A","UNDERLYING_RESOURCE_UNAVAILABLE-OMS_A") 

e <-c("2020-05-09 04:50:42","2020-05-09 06:16:54","2020-05-08 16:11:58","2020-05-08 16:11:58","2020-05-08 16:30:07","2020-05-09 03:00:08","2020-05-09 03:08:08","2020-05-09 03:28:08","2020-05-09 13:08:08","2020-05-09 13:10:08","2020-05-09 13:28:08","2020-05-09 14:28:08")

df <- data.frame(node, activity, e)
df
df <- data.frame(node, activity, e)
node这就是你想要的吗

library(dplyr)

tlead <- . %>% lead(., order_by = ., default = max(.) + 1801) # 1801 secs
tlag <- . %>% lag(., order_by = ., default = min(.) - 1801)

df %>% 
  mutate(e = as.POSIXct(e, tz = "")) %>% 
  group_by(node) %>% 
  filter(e - tlag(e) <= as.difftime("00:30:00") | tlead(e) - e <= as.difftime("00:30:00"))
库(dplyr)
t领先百分比(,订单数量=,默认值=最大值(.)+1801)#1801秒
tlag%滞后(,订单数量=,默认值=最小值(.)-1801)
df%>%
突变(e=as.POSIXct(e,tz=“”))%>%
分组依据(节点)%>%

filter(e-tlag(e)我不能理解这里的逻辑。你能解释一下1)为什么过滤后的数据帧
df1
,比原始数据帧
df
多行?2)什么是列
e
?3)如何确定活动的持续时间?我已经纠正了我的错误。2) 每个活动发生的时间。3) .我想为相应的节点选择30分钟范围内的所有活动,以其他方式删除这些活动
tlag
/
tlead
是一个返回上一个/下一个活动时间的函数。例如,如果活动A、B和C分别发生在时间1、3和2,则
tlag
返回NA(即没有以前的活动)、2和1。对于这个问题,我们将
tlag
的默认值设置为最短时间小于1801秒,因为我们假设“没有以前的活动”(即NA)表示之前的活动至少发生在30分钟和1秒之前。那么我猜你问题中的“30分钟内”应该翻译成“在上一次活动之后或下一次活动发生之前的30分钟内”。因此,我们需要e=tlead(e)-30分钟@Isuru非常感谢你的解释。作为扩展,是否有任何方法可以计算所选30分钟期间的活动数。@Isuru类似于此<代码>df%>%变异(e=as.POSIXct(e,tz=”“)%%>%groupby(node)%%>%总结(n=sum(介于(e,as.POSIXct(“2020-05-09 03:00:00”)和(as.POSIXct(“2020-05-09 03:30:00”))
是的,但如何修改整个数据集。因为数据集大的时候我们很难找到时间间隔
node <- c("ABCC","ABCC","ABCC","ABCC","ABCC","ABCC","ABCC","ABCC","ABCC")

activity <-c("NODE_ISOLATION","NE_NOT_REACH_VIA_PRIMARY_MNG_INTERF","NODE_ISOLATION","LOSS_OF_MULTIPLEX_SECTION-OMS_A","NODE_ISOLATION","NE_NOT_REACH_VIA_PRIMARY_MNG_INTERF","NE_NOT_REACH_VIA_PRIMARY_MNG_INTERF", "UNDERLYING_RESOURCE_UNAVAILABLE-OMS_A","UNDERLYING_RESOURCE_UNAVAILABLE-OMS_A") 

e <-c("2020-05-08 16:11:58","2020-05-08 16:11:58","2020-05-08 16:30:07","2020-05-09 03:00:08","2020-05-09 03:08:08","2020-05-09 03:28:08","2020-05-09 13:08:08","2020-05-09 13:10:08","2020-05-09 13:28:08")

df1 <- data.frame(node, activity, e)
df1 
library(dplyr)

tlead <- . %>% lead(., order_by = ., default = max(.) + 1801) # 1801 secs
tlag <- . %>% lag(., order_by = ., default = min(.) - 1801)

df %>% 
  mutate(e = as.POSIXct(e, tz = "")) %>% 
  group_by(node) %>% 
  filter(e - tlag(e) <= as.difftime("00:30:00") | tlead(e) - e <= as.difftime("00:30:00"))
# A tibble: 9 x 3
# Groups:   node [1]
  node  activity                              e                  
  <chr> <chr>                                 <dttm>             
1 ABCC  NODE_ISOLATION                        2020-05-08 16:11:58
2 ABCC  NE_NOT_REACH_VIA_PRIMARY_MNG_INTERF   2020-05-08 16:11:58
3 ABCC  NODE_ISOLATION                        2020-05-08 16:30:07
4 ABCC  LOSS_OF_MULTIPLEX_SECTION-OMS_A       2020-05-09 03:00:08
5 ABCC  NODE_ISOLATION                        2020-05-09 03:08:08
6 ABCC  NE_NOT_REACH_VIA_PRIMARY_MNG_INTERF   2020-05-09 03:28:08
7 ABCC  NE_NOT_REACH_VIA_PRIMARY_MNG_INTERF   2020-05-09 13:08:08
8 ABCC  UNDERLYING_RESOURCE_UNAVAILABLE-OMS_A 2020-05-09 13:10:08
9 ABCC  UNDERLYING_RESOURCE_UNAVAILABLE-OMS_A 2020-05-09 13:28:08