R：如何在给定范围内匹配多个变量_R_Dplyr_Match

R：如何在给定范围内匹配多个变量

R：如何在给定范围内匹配多个变量,r,dplyr,match,R,Dplyr,Match,这是我的数据 df <- data.frame(peak = c(1:5), RT = c(3, 3.6, 4, 4.1, 5), MZ = c(100, 200, 900, 100, 700)) library <- data.frame(Compound = c("A","B","C","D","E","F","G","H"), RT = c(3.11, 3.2, 4, 4.1, 4.2, 4.4, 4.9, 5), MZ = c(101, 200, 500, 250, 300,

这是我的数据

df <- data.frame(peak = c(1:5), RT = c(3, 3.6, 4, 4.1, 5), MZ = c(100, 200, 900, 100, 700))
library <- data.frame(Compound = c("A","B","C","D","E","F","G","H"), RT = c(3.11, 3.2, 4, 4.1, 4.2, 4.4, 4.9, 5), MZ = c(101, 200, 500, 250, 300, 330, 701, 702))

图书馆呢

> library
  Compound   RT  MZ
1        A 3.11 101
2        B 3.20 200
3        C 4.00 500
4        D 4.10 250
5        E 4.20 300
6        F 4.40 330
7        G 4.90 701
8        H 5.00 702

我想用这个表来匹配库以找到目标化合物，标准是RT偏差c（-0.5,0.5）和MZ偏差c（-5,5）。因此理想的结果如下：

  peak  RT  MZ Compound
1    1 3.0 100        A
2    2 3.6 200        B
3    3 4.0 900       NA
4    4 4.1 100       NA
5    5 5.0 700     G, H

另外，如果不使用for循环也能很好地解决这个问题，因为我的实际列表很长…

我们可以使用

tidyr

中的

crossing

来创建

库

和

df

的所有组合。使用

过滤器

我们只保留那些在范围内的行，并为每个

峰值

折叠

化合物

library(dplyr)

tidyr::crossing(library, setNames(df, c('peak', 'RT1', 'MZ1'))) %>%
  filter(abs(RT - RT1) <= 0.5 & abs(MZ - MZ1) <= 5) %>%
  group_by(peak) %>%
  summarise(Compound = toString(Compound)) %>%
  right_join(df, by = 'peak')

#   peak Compound    RT    MZ
#  <int> <chr>    <dbl> <dbl>
#1     1 A          3     100
#2     2 B          3.6   200
#3     3 NA         4     900
#4     4 NA         4.1   100
#5     5 G, H       5     700

库（dplyr）
tidyr:：交叉（库，集合名（df，c（'peak'，'RT1'，'MZ1'））%>%
过滤器（abs（RT-RT1）%
总结（化合物=toString（化合物））%>%
右联合（df，by='peak'）
#峰值复合RT-MZ
#        
#11100
#2 B 3.6 200
#3NA4900
#4 NA 4.1100
#5g，h5700

请共享可复制的数据。不要共享图片或链接。谢谢，数据包含在最后一句话中，但这不是更好。请尝试使用

dput

或类似内容包含您的数据。

library(dplyr)

tidyr::crossing(library, setNames(df, c('peak', 'RT1', 'MZ1'))) %>%
  filter(abs(RT - RT1) <= 0.5 & abs(MZ - MZ1) <= 5) %>%
  group_by(peak) %>%
  summarise(Compound = toString(Compound)) %>%
  right_join(df, by = 'peak')

#   peak Compound    RT    MZ
#  <int> <chr>    <dbl> <dbl>
#1     1 A          3     100
#2     2 B          3.6   200
#3     3 NA         4     900
#4     4 NA         4.1   100
#5     5 G, H       5     700