R:如何在给定范围内匹配多个变量
这是我的数据R:如何在给定范围内匹配多个变量,r,dplyr,match,R,Dplyr,Match,这是我的数据 df <- data.frame(peak = c(1:5), RT = c(3, 3.6, 4, 4.1, 5), MZ = c(100, 200, 900, 100, 700)) library <- data.frame(Compound = c("A","B","C","D","E","F","G","H"), RT = c(3.11, 3.2, 4, 4.1, 4.2, 4.4, 4.9, 5), MZ = c(101, 200, 500, 250, 300,
df <- data.frame(peak = c(1:5), RT = c(3, 3.6, 4, 4.1, 5), MZ = c(100, 200, 900, 100, 700))
library <- data.frame(Compound = c("A","B","C","D","E","F","G","H"), RT = c(3.11, 3.2, 4, 4.1, 4.2, 4.4, 4.9, 5), MZ = c(101, 200, 500, 250, 300, 330, 701, 702))
图书馆呢
> library
Compound RT MZ
1 A 3.11 101
2 B 3.20 200
3 C 4.00 500
4 D 4.10 250
5 E 4.20 300
6 F 4.40 330
7 G 4.90 701
8 H 5.00 702
我想用这个表来匹配库以找到目标化合物,标准是RT偏差c(-0.5,0.5)和MZ偏差c(-5,5)。因此理想的结果如下:
peak RT MZ Compound
1 1 3.0 100 A
2 2 3.6 200 B
3 3 4.0 900 NA
4 4 4.1 100 NA
5 5 5.0 700 G, H
另外,如果不使用for循环也能很好地解决这个问题,因为我的实际列表很长…我们可以使用
tidyr
中的crossing
来创建库
和df
的所有组合。使用过滤器
我们只保留那些在范围内的行,并为每个峰值
折叠化合物
library(dplyr)
tidyr::crossing(library, setNames(df, c('peak', 'RT1', 'MZ1'))) %>%
filter(abs(RT - RT1) <= 0.5 & abs(MZ - MZ1) <= 5) %>%
group_by(peak) %>%
summarise(Compound = toString(Compound)) %>%
right_join(df, by = 'peak')
# peak Compound RT MZ
# <int> <chr> <dbl> <dbl>
#1 1 A 3 100
#2 2 B 3.6 200
#3 3 NA 4 900
#4 4 NA 4.1 100
#5 5 G, H 5 700
库(dplyr)
tidyr::交叉(库,集合名(df,c('peak','RT1','MZ1'))%>%
过滤器(abs(RT-RT1)%
总结(化合物=toString(化合物))%>%
右联合(df,by='peak')
#峰值复合RT-MZ
#
#11100
#2 B 3.6 200
#3NA4900
#4 NA 4.1100
#5g,h5700
请共享可复制的数据。不要共享图片或链接。谢谢,数据包含在最后一句话中,但这不是更好。请尝试使用dput
或类似内容包含您的数据。
library(dplyr)
tidyr::crossing(library, setNames(df, c('peak', 'RT1', 'MZ1'))) %>%
filter(abs(RT - RT1) <= 0.5 & abs(MZ - MZ1) <= 5) %>%
group_by(peak) %>%
summarise(Compound = toString(Compound)) %>%
right_join(df, by = 'peak')
# peak Compound RT MZ
# <int> <chr> <dbl> <dbl>
#1 1 A 3 100
#2 2 B 3.6 200
#3 3 NA 4 900
#4 4 NA 4.1 100
#5 5 G, H 5 700