以R中的第一个条件为条件的过滤
我想根据两个条件筛选行: (1) 该行在某些指示符列中标记为“包含”,并且 (2) 该行位于第一个条件包含的列之后一分钟内 我可以在一个循环中使用标准的过滤方法来解决这个问题,但是有更复杂的解决方案吗?我很想看看人们能想出什么 以下是基于问题描述的玩具数据集和期望结果:以R中的第一个条件为条件的过滤,r,dplyr,filtering,R,Dplyr,Filtering,我想根据两个条件筛选行: (1) 该行在某些指示符列中标记为“包含”,并且 (2) 该行位于第一个条件包含的列之后一分钟内 我可以在一个循环中使用标准的过滤方法来解决这个问题,但是有更复杂的解决方案吗?我很想看看人们能想出什么 以下是基于问题描述的玩具数据集和期望结果: 库(tidyverse) df这里有一种使用基本RMap的方法。首先,我们找出ind=“include”的索引,并得到相应的时间值和一分钟后之间的所有行 include_ind <- which(df$ind == "in
库(tidyverse)
df这里有一种使用基本RMap
的方法。首先,我们找出ind=“include”
的索引,并得到相应的时间
值和一分钟后之间的所有行
include_ind <- which(df$ind == "include")
df[unique(unlist(Map(function(x, y) which(df$time >= x & df$time <= y),
df$time[include_ind], df$time[include_ind] + 1))), ]
# A tibble: 13 x 2
# time ind
# <dbl> <chr>
# 1 0 include
# 2 0.46 exclude
# 3 0.73 exclude
# 4 2.23 include
# 5 2.65 include
# 6 3.18 exclude
# 7 3.45 exclude
# 8 5.78 include
# 9 5.89 exclude
#10 6.51 exclude
#11 6.71 exclude
#12 10.2 include
#13 10.8 exclude
include_ind=x&df$time这里有一种使用基本RMap
的方法。首先,我们找出ind=“include”
的索引,并得到相应的时间
值和一分钟后之间的所有行
include_ind <- which(df$ind == "include")
df[unique(unlist(Map(function(x, y) which(df$time >= x & df$time <= y),
df$time[include_ind], df$time[include_ind] + 1))), ]
# A tibble: 13 x 2
# time ind
# <dbl> <chr>
# 1 0 include
# 2 0.46 exclude
# 3 0.73 exclude
# 4 2.23 include
# 5 2.65 include
# 6 3.18 exclude
# 7 3.45 exclude
# 8 5.78 include
# 9 5.89 exclude
#10 6.51 exclude
#11 6.71 exclude
#12 10.2 include
#13 10.8 exclude
include\u ind=x&df$time这里有一个dplyr
解决方案,它在每次出现“include”
时将数据分成若干组,然后在这些组中进行筛选:
df %>%
# Associate each row with the most recent "include"
mutate(group = cumsum(ind == "include")) %>%
group_by(group) %>%
filter(time <= (first(time) + 1))
这里有一个dplyr
解决方案,它在每次出现“include”
时将数据分成若干组,然后在这些组中进行筛选:
df %>%
# Associate each row with the most recent "include"
mutate(group = cumsum(ind == "include")) %>%
group_by(group) %>%
filter(time <= (first(time) + 1))
在data.table包中,这可以通过滚动联接实现:
setDT(df)
df[ind=="include","time"][df, on="time", roll=+1, nomatch=0L]
# time ind
# 1: 0.00 include
# 2: 0.46 exclude
# 3: 0.73 exclude
# 4: 2.23 include
# 5: 2.65 include
# 6: 3.18 exclude
# 7: 3.45 exclude
# 8: 5.78 include
# 9: 5.89 exclude
#10: 6.51 exclude
#11: 6.71 exclude
#12: 10.15 include
#13: 10.75 exclude
在data.table包中,这可以通过滚动联接实现:
setDT(df)
df[ind=="include","time"][df, on="time", roll=+1, nomatch=0L]
# time ind
# 1: 0.00 include
# 2: 0.46 exclude
# 3: 0.73 exclude
# 4: 2.23 include
# 5: 2.65 include
# 6: 3.18 exclude
# 7: 3.45 exclude
# 8: 5.78 include
# 9: 5.89 exclude
#10: 6.51 exclude
#11: 6.71 exclude
#12: 10.15 include
#13: 10.75 exclude
@最近的邮件是的,有一些重叠。我现在已经改正了。谢谢:)@最近的邮件是的,有一些重叠。我现在已经改正了。谢谢:)这正是我要找的@Marius。非常感谢。这正是我要找的@Marius。非常感谢。