R 在数据框中搜索接近数字列表的值_R_Search_Dataframe

R 在数据框中搜索接近数字列表的值

r search dataframe

R 在数据框中搜索接近数字列表的值,r,search,dataframe,R,Search,Dataframe,注意：这个问题中的“数据帧”实际上是矩阵。我保留了措辞，以便答案仍然有意义，因为解决方案同时适用于数据帧和矩阵我有一个数据框，包含质量及其相应强度的列表： > df <- cbind(c(3.43534, 5.324, 9.322, 123.234), c(31, 4214, 112, 44)) > colnames(df) <- c("Mass", "I") > df Mass I [1,] 3.43534 31 [2,]

注意：这个问题中的“数据帧”实际上是矩阵。我保留了措辞，以便答案仍然有意义，因为解决方案同时适用于数据帧和矩阵

我有一个数据框，包含质量及其相应强度的列表：

> df <- cbind(c(3.43534, 5.324, 9.322, 123.234), c(31, 4214, 112, 44))
> colnames(df) <- c("Mass", "I")
> df
          Mass    I
[1,]   3.43534   31
[2,]   5.32400 4214
[3,]   9.32200  112
[4,] 123.23400   44

现实生活中的数据有数千行，因此不幸的是循环速度太慢。

这应该可以做到：

tol <- 0.2
compounds[which(do.call(pmin,as.data.frame(abs(outer(compounds$Mass, df$Mass, "-")))) < tol),]
##  Mass Compound
##1 3.39        A
##3 9.31        C

这样会快一点

数据：

df <- structure(list(Mass = c(3.43534, 5.324, 9.322, 123.234), I = c(31, 
4214, 112, 44)), .Names = c("Mass", "I"), row.names = c(NA, -4L
), class = "data.frame")
##       Mass    I
##1   3.43534   31
##2   5.32400 4214
##3   9.32200  112
##4 123.23400   44

compounds <- structure(list(Mass = c(3.39, 102.93, 9.31), Compound = structure(1:3, .Label = c("A", 
"B", "C"), class = "factor")), .Names = c("Mass", "Compound"), row.names = c(NA, 
-3L), class = "data.frame")
##    Mass Compound
##1   3.39        A
##2 102.93        B
##3   9.31        C
##4 144.00        D

df关于如何连接两个数据集，然后根据所需条件进行过滤，我可以想出一个解决方案：
df <- data.frame(Mass = c(3.43534, 5.324, 9.322, 123.234),
                 I = c(31, 4214, 112, 44))
compounds <- data.frame(Mass = c(3.39, 102.93, 9.310, 144.00),
                        Compound = c("A", "B", "C", "D"),
                        stringsAsFactors = FALSE)

df <- expand.grid(df$Mass, 1:nrow(compounds))
df <- cbind(Var1 = df$Var1, compounds[df$Var2, ])
unique(df[(df$Mass > df$Var1 - 0.2 & df$Mass < df$Var1 + 0.2),
           c('Mass', 'Compound')])

df这些是矩阵，不是数据帧。抱歉，你是对的。我犯了这个错误，因为我使用的完整数据集是data.frame，尽管爱照的解决方案仍然有效。现在将更改措辞。非常有趣的解决方案，尽管由于do.call.
谢谢大家！只是测试每个解决方案，看看哪个在全尺寸数据集上运行得更快。@Gopala：事实上，我不这么认为，因为do.call
用于调用pmin
，它与列表（即距离矩阵的列）平行。这就是为什么我们使用as.data.frame
进行转换，它给出了列列表。它必须逐行转换，但这与n平方循环不同。所以，我同意它更优化。@DGreenwood：是的，第二个解决方案没有计算最小距离；它只查找距离小于df
中任何行的公差的化合物；因此速度更快。注意这一点，因为根据数据大小，它可能会占用太多内存。错误：找不到对象“x”。对此表示抱歉。我将名称从x更改为df，但没有完全修复它。现在，编辑。
compounds[unique(which(abs(outer(compounds$Mass, df$Mass, "-")) < tol, arr.ind=TRUE)[,1]),]
##  Mass Compound
##1 3.39        A
##3 9.31        C

df <- structure(list(Mass = c(3.43534, 5.324, 9.322, 123.234), I = c(31, 
4214, 112, 44)), .Names = c("Mass", "I"), row.names = c(NA, -4L
), class = "data.frame")
##       Mass    I
##1   3.43534   31
##2   5.32400 4214
##3   9.32200  112
##4 123.23400   44

compounds <- structure(list(Mass = c(3.39, 102.93, 9.31), Compound = structure(1:3, .Label = c("A", 
"B", "C"), class = "factor")), .Names = c("Mass", "Compound"), row.names = c(NA, 
-3L), class = "data.frame")
##    Mass Compound
##1   3.39        A
##2 102.93        B
##3   9.31        C
##4 144.00        D

df <- data.frame(Mass = c(3.43534, 5.324, 9.322, 123.234),
                 I = c(31, 4214, 112, 44))
compounds <- data.frame(Mass = c(3.39, 102.93, 9.310, 144.00),
                        Compound = c("A", "B", "C", "D"),
                        stringsAsFactors = FALSE)

df <- expand.grid(df$Mass, 1:nrow(compounds))
df <- cbind(Var1 = df$Var1, compounds[df$Var2, ])
unique(df[(df$Mass > df$Var1 - 0.2 & df$Mass < df$Var1 + 0.2),
           c('Mass', 'Compound')])