以R中的第一个条件为条件的过滤

以R中的第一个条件为条件的过滤,r,dplyr,filtering,R,Dplyr,Filtering,我想根据两个条件筛选行: (1) 该行在某些指示符列中标记为“包含”,并且 (2) 该行位于第一个条件包含的列之后一分钟内 我可以在一个循环中使用标准的过滤方法来解决这个问题,但是有更复杂的解决方案吗?我很想看看人们能想出什么 以下是基于问题描述的玩具数据集和期望结果: 库(tidyverse) df这里有一种使用基本RMap的方法。首先,我们找出ind=“include”的索引,并得到相应的时间值和一分钟后之间的所有行 include_ind <- which(df$ind == "in

我想根据两个条件筛选行:

(1) 该行在某些指示符列中标记为“包含”,并且

(2) 该行位于第一个条件包含的列之后一分钟内

我可以在一个循环中使用标准的过滤方法来解决这个问题,但是有更复杂的解决方案吗?我很想看看人们能想出什么

以下是基于问题描述的玩具数据集和期望结果:

库(tidyverse)

df这里有一种使用基本R
Map
的方法。首先,我们找出
ind=“include”
的索引,并得到相应的
时间
值和一分钟后之间的所有行

include_ind <- which(df$ind == "include")
df[unique(unlist(Map(function(x, y) which(df$time >= x & df$time <= y),
                 df$time[include_ind], df$time[include_ind] + 1))), ]

# A tibble: 13 x 2
#    time ind    
#   <dbl> <chr>  
# 1  0    include
# 2  0.46 exclude
# 3  0.73 exclude
# 4  2.23 include
# 5  2.65 include
# 6  3.18 exclude
# 7  3.45 exclude
# 8  5.78 include
# 9  5.89 exclude
#10  6.51 exclude
#11  6.71 exclude
#12 10.2  include
#13 10.8  exclude

include_ind=x&df$time这里有一种使用基本R
Map
的方法。首先,我们找出
ind=“include”
的索引,并得到相应的
时间
值和一分钟后之间的所有行

include_ind <- which(df$ind == "include")
df[unique(unlist(Map(function(x, y) which(df$time >= x & df$time <= y),
                 df$time[include_ind], df$time[include_ind] + 1))), ]

# A tibble: 13 x 2
#    time ind    
#   <dbl> <chr>  
# 1  0    include
# 2  0.46 exclude
# 3  0.73 exclude
# 4  2.23 include
# 5  2.65 include
# 6  3.18 exclude
# 7  3.45 exclude
# 8  5.78 include
# 9  5.89 exclude
#10  6.51 exclude
#11  6.71 exclude
#12 10.2  include
#13 10.8  exclude

include\u ind=x&df$time这里有一个
dplyr
解决方案,它在每次出现
“include”
时将数据分成若干组,然后在这些组中进行筛选:

df %>%
    # Associate each row with the most recent "include"
    mutate(group = cumsum(ind == "include")) %>%
    group_by(group) %>%
    filter(time <= (first(time) + 1))

这里有一个
dplyr
解决方案,它在每次出现
“include”
时将数据分成若干组,然后在这些组中进行筛选:

df %>%
    # Associate each row with the most recent "include"
    mutate(group = cumsum(ind == "include")) %>%
    group_by(group) %>%
    filter(time <= (first(time) + 1))
在data.table包中,这可以通过滚动联接实现:

setDT(df)
df[ind=="include","time"][df, on="time", roll=+1, nomatch=0L]
#     time     ind
# 1:  0.00 include
# 2:  0.46 exclude
# 3:  0.73 exclude
# 4:  2.23 include
# 5:  2.65 include
# 6:  3.18 exclude
# 7:  3.45 exclude
# 8:  5.78 include
# 9:  5.89 exclude
#10:  6.51 exclude
#11:  6.71 exclude
#12: 10.15 include
#13: 10.75 exclude
在data.table包中,这可以通过滚动联接实现:

setDT(df)
df[ind=="include","time"][df, on="time", roll=+1, nomatch=0L]
#     time     ind
# 1:  0.00 include
# 2:  0.46 exclude
# 3:  0.73 exclude
# 4:  2.23 include
# 5:  2.65 include
# 6:  3.18 exclude
# 7:  3.45 exclude
# 8:  5.78 include
# 9:  5.89 exclude
#10:  6.51 exclude
#11:  6.71 exclude
#12: 10.15 include
#13: 10.75 exclude

@最近的邮件是的,有一些重叠。我现在已经改正了。谢谢:)@最近的邮件是的,有一些重叠。我现在已经改正了。谢谢:)这正是我要找的@Marius。非常感谢。这正是我要找的@Marius。非常感谢。