R:如何在给定范围内匹配多个变量

R:如何在给定范围内匹配多个变量,r,dplyr,match,R,Dplyr,Match,这是我的数据 df <- data.frame(peak = c(1:5), RT = c(3, 3.6, 4, 4.1, 5), MZ = c(100, 200, 900, 100, 700)) library <- data.frame(Compound = c("A","B","C","D","E","F","G","H"), RT = c(3.11, 3.2, 4, 4.1, 4.2, 4.4, 4.9, 5), MZ = c(101, 200, 500, 250, 300,

这是我的数据

df <- data.frame(peak = c(1:5), RT = c(3, 3.6, 4, 4.1, 5), MZ = c(100, 200, 900, 100, 700))
library <- data.frame(Compound = c("A","B","C","D","E","F","G","H"), RT = c(3.11, 3.2, 4, 4.1, 4.2, 4.4, 4.9, 5), MZ = c(101, 200, 500, 250, 300, 330, 701, 702))
图书馆呢

> library
  Compound   RT  MZ
1        A 3.11 101
2        B 3.20 200
3        C 4.00 500
4        D 4.10 250
5        E 4.20 300
6        F 4.40 330
7        G 4.90 701
8        H 5.00 702
我想用这个表来匹配库以找到目标化合物,标准是RT偏差c(-0.5,0.5)和MZ偏差c(-5,5)。因此理想的结果如下:

  peak  RT  MZ Compound
1    1 3.0 100        A
2    2 3.6 200        B
3    3 4.0 900       NA
4    4 4.1 100       NA
5    5 5.0 700     G, H

另外,如果不使用for循环也能很好地解决这个问题,因为我的实际列表很长…

我们可以使用
tidyr
中的
crossing
来创建
df
的所有组合。使用
过滤器
我们只保留那些在范围内的行,并为每个
峰值
折叠
化合物

library(dplyr)

tidyr::crossing(library, setNames(df, c('peak', 'RT1', 'MZ1'))) %>%
  filter(abs(RT - RT1) <= 0.5 & abs(MZ - MZ1) <= 5) %>%
  group_by(peak) %>%
  summarise(Compound = toString(Compound)) %>%
  right_join(df, by = 'peak')

#   peak Compound    RT    MZ
#  <int> <chr>    <dbl> <dbl>
#1     1 A          3     100
#2     2 B          3.6   200
#3     3 NA         4     900
#4     4 NA         4.1   100
#5     5 G, H       5     700
库(dplyr)
tidyr::交叉(库,集合名(df,c('peak','RT1','MZ1'))%>%
过滤器(abs(RT-RT1)%
总结(化合物=toString(化合物))%>%
右联合(df,by='peak')
#峰值复合RT-MZ
#        
#11100
#2 B 3.6 200
#3NA4900
#4 NA 4.1100
#5g,h5700

请共享可复制的数据。不要共享图片或链接。谢谢,数据包含在最后一句话中,但这不是更好。请尝试使用
dput
或类似内容包含您的数据。
library(dplyr)

tidyr::crossing(library, setNames(df, c('peak', 'RT1', 'MZ1'))) %>%
  filter(abs(RT - RT1) <= 0.5 & abs(MZ - MZ1) <= 5) %>%
  group_by(peak) %>%
  summarise(Compound = toString(Compound)) %>%
  right_join(df, by = 'peak')

#   peak Compound    RT    MZ
#  <int> <chr>    <dbl> <dbl>
#1     1 A          3     100
#2     2 B          3.6   200
#3     3 NA         4     900
#4     4 NA         4.1   100
#5     5 G, H       5     700