R 通过匹配多个条件,基于另一个数据帧筛选一个数据帧中的记录

R 通过匹配多个条件,基于另一个数据帧筛选一个数据帧中的记录,r,dplyr,R,Dplyr,我有以下两个数据帧,dat1和dat2: library(tidyverse) dat1 <- tribble( ~"subj", ~"drive", ~"measure", "A", 1, 1, "A", 1, 2, "A", 1, 3, "A", 1, 4, "A", 1, 5, "A", 2, 1, "A", 2, 2, "A", 2, 3, "A", 2, 4, "A", 2, 5, "B", 1, 1, "B", 1, 2,

我有以下两个数据帧,
dat1
dat2

library(tidyverse)
dat1 <- tribble(
  ~"subj", ~"drive", ~"measure",
  "A", 1, 1,
  "A", 1, 2,
  "A", 1, 3,
  "A", 1, 4,
  "A", 1, 5,
  "A", 2, 1,
  "A", 2, 2,
  "A", 2, 3,
  "A", 2, 4,
  "A", 2, 5,
  "B", 1, 1,
  "B", 1, 2,
  "B", 1, 3,
  "B", 1, 4,
  "B", 1, 5,
  "B", 2, 1,
  "B", 2, 2,
  "B", 2, 3,
  "B", 2, 4,
  "B", 2, 5,
)

dat2 <- tribble(
  ~"subj", ~"drive", ~"measure",
  "A", 1, 3,
  "B", 2, 4
)

我知道
dplyr::semi_join()
,但它不允许我根据范围进行过滤。有没有办法解决这个问题<基于code>Tidyverse的解决方案将非常棒

编辑为使用GG评论中提到的本机sqldf字符串替换,而不是sprintf

library(sqldf)

check_range <- 1

fn$sqldf('
select  one.*
from    dat1 one
        join dat2 two
          on  one.subj = two.subj
              and one.drive = two.drive
              and one.measure - two.measure between -`check_range` and `check_range`
')
#   subj drive measure
# 1    A     1       2
# 2    A     1       3
# 3    A     1       4
# 4    B     2       3
# 5    B     2       4
# 6    B     2       5
库(sqldf)

检查范围一个选项是首先进行
内部连接
,然后在

library(dplyr)
inner_join(dat1, dat2, by = c('subj', 'drive')) %>% 
    group_by(subj, drive) %>% 
    filter(between(measure.x, first(measure.y)-1, first(measure.y) + 1)) %>% 
    select(measure = measure.x)
# A tibble: 6 x 3
# Groups:   subj, drive [2]
#  subj  drive measure
#  <chr> <dbl>   <dbl>
#1 A         1       2
#2 A         1       3
#3 A         1       4
#4 B         2       3
#5 B         2       4
#6 B         2       5

为了完整性起见,这里还有一个使用非等联接的解决方案:


您是说结果必须与
dat2
中的值相差一个单位,还是应该在[-1,+1]范围内?如果是前者,你应该重新表述你的问题。如果是后者,结果也应该包含
“A”,1,3
“B”,2,4
,对吗?@Brunox13,谢谢!我编辑了这个问题。我根据Brunox13指出的错误稍微编辑了这个问题。您可能需要更新您的答案以匹配我问题中的新结果数据框。谢谢@谢谢你。更新了帖子
library(dplyr)
inner_join(dat1, dat2, by = c('subj', 'drive')) %>% 
    group_by(subj, drive) %>% 
    filter(between(measure.x, first(measure.y)-1, first(measure.y) + 1)) %>% 
    select(measure = measure.x)
# A tibble: 6 x 3
# Groups:   subj, drive [2]
#  subj  drive measure
#  <chr> <dbl>   <dbl>
#1 A         1       2
#2 A         1       3
#3 A         1       4
#4 B         2       3
#5 B         2       4
#6 B         2       5
library(data.table)
setDT(dat1)[setDT(dat2), .SD[between(measure, i.measure -1,
          i.measure + 1)], on = .(subj, drive), by = .EACHI]
#    subj drive measure
#1:    A     1       2
#2:    A     1       3
#3:    A     1       4
#4:    B     2       3
#5:    B     2       4
#6:    B     2       5
library(data.table)
range <- 1
idx <- setDT(dat1)[
  setDT(dat2)[, .(subj, drive, lower = measure - range, upper = measure + range)], 
  on = .(subj, drive, measure >= lower, measure <= upper), which = TRUE]
dat1[idx]
   subj drive measure
1:    A     1       2
2:    A     1       3
3:    A     1       4
4:    B     2       3
5:    B     2       4
6:    B     2       5