R 通过匹配多个条件,基于另一个数据帧筛选一个数据帧中的记录
我有以下两个数据帧,R 通过匹配多个条件,基于另一个数据帧筛选一个数据帧中的记录,r,dplyr,R,Dplyr,我有以下两个数据帧,dat1和dat2: library(tidyverse) dat1 <- tribble( ~"subj", ~"drive", ~"measure", "A", 1, 1, "A", 1, 2, "A", 1, 3, "A", 1, 4, "A", 1, 5, "A", 2, 1, "A", 2, 2, "A", 2, 3, "A", 2, 4, "A", 2, 5, "B", 1, 1, "B", 1, 2,
dat1
和dat2
:
library(tidyverse)
dat1 <- tribble(
~"subj", ~"drive", ~"measure",
"A", 1, 1,
"A", 1, 2,
"A", 1, 3,
"A", 1, 4,
"A", 1, 5,
"A", 2, 1,
"A", 2, 2,
"A", 2, 3,
"A", 2, 4,
"A", 2, 5,
"B", 1, 1,
"B", 1, 2,
"B", 1, 3,
"B", 1, 4,
"B", 1, 5,
"B", 2, 1,
"B", 2, 2,
"B", 2, 3,
"B", 2, 4,
"B", 2, 5,
)
dat2 <- tribble(
~"subj", ~"drive", ~"measure",
"A", 1, 3,
"B", 2, 4
)
我知道
dplyr::semi_join()
,但它不允许我根据范围进行过滤。有没有办法解决这个问题<基于code>Tidyverse的解决方案将非常棒 编辑为使用GG评论中提到的本机sqldf字符串替换,而不是sprintf
library(sqldf)
check_range <- 1
fn$sqldf('
select one.*
from dat1 one
join dat2 two
on one.subj = two.subj
and one.drive = two.drive
and one.measure - two.measure between -`check_range` and `check_range`
')
# subj drive measure
# 1 A 1 2
# 2 A 1 3
# 3 A 1 4
# 4 B 2 3
# 5 B 2 4
# 6 B 2 5
库(sqldf)
检查范围一个选项是首先进行内部连接
,然后在
library(dplyr)
inner_join(dat1, dat2, by = c('subj', 'drive')) %>%
group_by(subj, drive) %>%
filter(between(measure.x, first(measure.y)-1, first(measure.y) + 1)) %>%
select(measure = measure.x)
# A tibble: 6 x 3
# Groups: subj, drive [2]
# subj drive measure
# <chr> <dbl> <dbl>
#1 A 1 2
#2 A 1 3
#3 A 1 4
#4 B 2 3
#5 B 2 4
#6 B 2 5
为了完整性起见,这里还有一个使用非等联接的解决方案:
您是说结果必须与
dat2
中的值相差一个单位,还是应该在[-1,+1]范围内?如果是前者,你应该重新表述你的问题。如果是后者,结果也应该包含“A”,1,3
和“B”,2,4
,对吗?@Brunox13,谢谢!我编辑了这个问题。我根据Brunox13指出的错误稍微编辑了这个问题。您可能需要更新您的答案以匹配我问题中的新结果数据框。谢谢@谢谢你。更新了帖子
library(dplyr)
inner_join(dat1, dat2, by = c('subj', 'drive')) %>%
group_by(subj, drive) %>%
filter(between(measure.x, first(measure.y)-1, first(measure.y) + 1)) %>%
select(measure = measure.x)
# A tibble: 6 x 3
# Groups: subj, drive [2]
# subj drive measure
# <chr> <dbl> <dbl>
#1 A 1 2
#2 A 1 3
#3 A 1 4
#4 B 2 3
#5 B 2 4
#6 B 2 5
library(data.table)
setDT(dat1)[setDT(dat2), .SD[between(measure, i.measure -1,
i.measure + 1)], on = .(subj, drive), by = .EACHI]
# subj drive measure
#1: A 1 2
#2: A 1 3
#3: A 1 4
#4: B 2 3
#5: B 2 4
#6: B 2 5
library(data.table)
range <- 1
idx <- setDT(dat1)[
setDT(dat2)[, .(subj, drive, lower = measure - range, upper = measure + range)],
on = .(subj, drive, measure >= lower, measure <= upper), which = TRUE]
dat1[idx]
subj drive measure
1: A 1 2
2: A 1 3
3: A 1 4
4: B 2 3
5: B 2 4
6: B 2 5