慢函数，如何在R中删除for循环_R_Performance_For Loop_Sapply

慢函数，如何在R中删除for循环

r performance for-loop

慢函数，如何在R中删除for循环,r,performance,for-loop,sapply,R,Performance,For Loop,Sapply,我在R中有一个函数，它将较小的向量与较大的向量进行比较，然后找到匹配的位置，并使用该信息从较大的数据帧中提取数据 compare_masses <- function(mass_lst){ for (i in seq_along(mass_lst)) { positions <- which(abs(AB_massLst_numeric - mass_lst[i]) < 0.02) rows <- AB_lst[positions,] matc

我在R中有一个函数，它将较小的向量与较大的向量进行比较，然后找到匹配的位置，并使用该信息从较大的数据帧中提取数据

compare_masses <- function(mass_lst){
  for (i in seq_along(mass_lst)) {
    positions <- which(abs(AB_massLst_numeric - mass_lst[i]) < 0.02)
    rows <- AB_lst[positions,]
    match_df <- rbind(match_df, rows)
   }
}

因此，我的问题是如何使这个函数更快，并可能删除for循环？我的数据在现实生活中比我提供的例子要大得多。我想不出一种不进行迭代的方法来让这个函数工作。

尝试将它全部包装在一个调用中，并使用

do.call

，这样它可以同时执行所有

rbind

调用，而不是一次执行一个

match_df <- do.call(rbind.data.frame, lapply(
    mass_lst, function(x)
        AB_lst[abs(AB_lst_numeric - x) < 0.02,]))

尝试在一个调用中包装所有调用，并使用

do.call

，这样它可以同时执行所有

rbind

调用，而不是一次执行一个调用

match_df <- do.call(rbind.data.frame, lapply(
    mass_lst, function(x)
        AB_lst[abs(AB_lst_numeric - x) < 0.02,]))

尝试在一个调用中包装所有调用，并使用

do.call

，这样它可以同时执行所有

rbind

调用，而不是一次执行一个调用

match_df <- do.call(rbind.data.frame, lapply(
    mass_lst, function(x)
        AB_lst[abs(AB_lst_numeric - x) < 0.02,]))

尝试在一个调用中包装所有调用，并使用

do.call

，这样它可以同时执行所有

rbind

调用，而不是一次执行一个调用

match_df <- do.call(rbind.data.frame, lapply(
    mass_lst, function(x)
        AB_lst[abs(AB_lst_numeric - x) < 0.02,]))

这应该是一个矢量化的解决方案。使用“比较质量”功能。它比这里的其他解决方案要快得多

Unit: microseconds
           expr      min       lq      mean   median       uq      max neval  cld
      Vectorize  318.595  327.280  358.9813  355.112  386.892  413.739    10  b  
        do.call 1418.473 1510.853 1569.7161 1578.954 1635.606 1744.173    10    d
      bind_rows  744.570  801.420  813.9346  815.435  836.161  871.297    10   c 
 compare_masses  135.808  138.176  158.0344  158.508  169.365  197.395    10 a

编写一个匿名函数来矢量化。进行与循环中相同的比较

pos = Vectorize(FUN = function(y) {abs(AB_massLst_numeric-y) < 0.02}, vectorize.args = "y")

子集与存储

AB_lst[i,]

编辑：使用“比较”功能。它比这里的其他解决方案要快得多

Unit: microseconds
           expr      min       lq      mean   median       uq      max neval  cld
      Vectorize  318.595  327.280  358.9813  355.112  386.892  413.739    10  b  
        do.call 1418.473 1510.853 1569.7161 1578.954 1635.606 1744.173    10    d
      bind_rows  744.570  801.420  813.9346  815.435  836.161  871.297    10   c 
 compare_masses  135.808  138.176  158.0344  158.508  169.365  197.395    10 a

更大的测试数据集

Unit: nanoseconds
           expr      min       lq         mean   median       uq       max neval cld
      Vectorize   239242   292341   342314.079   324714   359455   3480844  1000 a  
 compare_masses      395     1975     3674.669     3554     4738     19346  1000 a  
        do.call 16570424 18223007 21092022.254 20921183 22194176 159718470  1000   c
      bind_rows 13423572 14869680 17027330.356 17008639 18061341 116983885  1000  b

这应该是一个矢量化的解决方案。使用“比较质量”功能。它比这里的其他解决方案要快得多

Unit: microseconds
           expr      min       lq      mean   median       uq      max neval  cld
      Vectorize  318.595  327.280  358.9813  355.112  386.892  413.739    10  b  
        do.call 1418.473 1510.853 1569.7161 1578.954 1635.606 1744.173    10    d
      bind_rows  744.570  801.420  813.9346  815.435  836.161  871.297    10   c 
 compare_masses  135.808  138.176  158.0344  158.508  169.365  197.395    10 a

编写一个匿名函数来矢量化。进行与循环中相同的比较

pos = Vectorize(FUN = function(y) {abs(AB_massLst_numeric-y) < 0.02}, vectorize.args = "y")

子集与存储

AB_lst[i,]

编辑：使用“比较”功能。它比这里的其他解决方案要快得多

Unit: microseconds
           expr      min       lq      mean   median       uq      max neval  cld
      Vectorize  318.595  327.280  358.9813  355.112  386.892  413.739    10  b  
        do.call 1418.473 1510.853 1569.7161 1578.954 1635.606 1744.173    10    d
      bind_rows  744.570  801.420  813.9346  815.435  836.161  871.297    10   c 
 compare_masses  135.808  138.176  158.0344  158.508  169.365  197.395    10 a

更大的测试数据集

Unit: nanoseconds
           expr      min       lq         mean   median       uq       max neval cld
      Vectorize   239242   292341   342314.079   324714   359455   3480844  1000 a  
 compare_masses      395     1975     3674.669     3554     4738     19346  1000 a  
        do.call 16570424 18223007 21092022.254 20921183 22194176 159718470  1000   c
      bind_rows 13423572 14869680 17027330.356 17008639 18061341 116983885  1000  b

这应该是一个矢量化的解决方案。使用“比较质量”功能。它比这里的其他解决方案要快得多

Unit: microseconds
           expr      min       lq      mean   median       uq      max neval  cld
      Vectorize  318.595  327.280  358.9813  355.112  386.892  413.739    10  b  
        do.call 1418.473 1510.853 1569.7161 1578.954 1635.606 1744.173    10    d
      bind_rows  744.570  801.420  813.9346  815.435  836.161  871.297    10   c 
 compare_masses  135.808  138.176  158.0344  158.508  169.365  197.395    10 a

编写一个匿名函数来矢量化。进行与循环中相同的比较

pos = Vectorize(FUN = function(y) {abs(AB_massLst_numeric-y) < 0.02}, vectorize.args = "y")

子集与存储

AB_lst[i,]

编辑：使用“比较”功能。它比这里的其他解决方案要快得多

Unit: microseconds
           expr      min       lq      mean   median       uq      max neval  cld
      Vectorize  318.595  327.280  358.9813  355.112  386.892  413.739    10  b  
        do.call 1418.473 1510.853 1569.7161 1578.954 1635.606 1744.173    10    d
      bind_rows  744.570  801.420  813.9346  815.435  836.161  871.297    10   c 
 compare_masses  135.808  138.176  158.0344  158.508  169.365  197.395    10 a

更大的测试数据集

Unit: nanoseconds
           expr      min       lq         mean   median       uq       max neval cld
      Vectorize   239242   292341   342314.079   324714   359455   3480844  1000 a  
 compare_masses      395     1975     3674.669     3554     4738     19346  1000 a  
        do.call 16570424 18223007 21092022.254 20921183 22194176 159718470  1000   c
      bind_rows 13423572 14869680 17027330.356 17008639 18061341 116983885  1000  b

这应该是一个矢量化的解决方案。使用“比较质量”功能。它比这里的其他解决方案要快得多

Unit: microseconds
           expr      min       lq      mean   median       uq      max neval  cld
      Vectorize  318.595  327.280  358.9813  355.112  386.892  413.739    10  b  
        do.call 1418.473 1510.853 1569.7161 1578.954 1635.606 1744.173    10    d
      bind_rows  744.570  801.420  813.9346  815.435  836.161  871.297    10   c 
 compare_masses  135.808  138.176  158.0344  158.508  169.365  197.395    10 a

编写一个匿名函数来矢量化。进行与循环中相同的比较

pos = Vectorize(FUN = function(y) {abs(AB_massLst_numeric-y) < 0.02}, vectorize.args = "y")

子集与存储

AB_lst[i,]

编辑：使用“比较”功能。它比这里的其他解决方案要快得多

Unit: microseconds
           expr      min       lq      mean   median       uq      max neval  cld
      Vectorize  318.595  327.280  358.9813  355.112  386.892  413.739    10  b  
        do.call 1418.473 1510.853 1569.7161 1578.954 1635.606 1744.173    10    d
      bind_rows  744.570  801.420  813.9346  815.435  836.161  871.297    10   c 
 compare_masses  135.808  138.176  158.0344  158.508  169.365  197.395    10 a

更大的测试数据集

Unit: nanoseconds
           expr      min       lq         mean   median       uq       max neval cld
      Vectorize   239242   292341   342314.079   324714   359455   3480844  1000 a  
 compare_masses      395     1975     3674.669     3554     4738     19346  1000 a  
        do.call 16570424 18223007 21092022.254 20921183 22194176 159718470  1000   c
      bind_rows 13423572 14869680 17027330.356 17008639 18061341 116983885  1000  b

使用R的向量循环功能。首先构造长度为N*m的

位置

向量，其中N是

AB\u lst

中的行数，m是

长度（质量）

。然后使用此向量从数据帧中选择行

请参阅下面完整的可运行示例

positions <- c()
compare_masses <- function(mass_lst){
  for (i in seq_along(mass_lst)) {
    positions <- c(positions, which(abs(AB_massLst_numeric - mass_lst[i]) < 0.02))
   }
   return(AB_lst[positions,])
}

mass_lst <- c(375, 243, 676, 121)
AB_massLst_numeric <- c(323, 474, 812, 375, 999, 271, 676, 232, 676)

AB_lst <- data.frame(x=1,y=AB_massLst_numeric)
match_df <- AB_lst[c(),]

compare_masses(mass_lst)

positions使用R的向量循环功能。首先构造长度为N*m的positions
向量，其中N是AB\u lst
中的行数，m是长度（质量）
。然后使用此向量从数据帧中选择行
请参阅下面完整的可运行示例
positions <- c()
compare_masses <- function(mass_lst){
  for (i in seq_along(mass_lst)) {
    positions <- c(positions, which(abs(AB_massLst_numeric - mass_lst[i]) < 0.02))
   }
   return(AB_lst[positions,])
}

mass_lst <- c(375, 243, 676, 121)
AB_massLst_numeric <- c(323, 474, 812, 375, 999, 271, 676, 232, 676)

AB_lst <- data.frame(x=1,y=AB_massLst_numeric)
match_df <- AB_lst[c(),]

compare_masses(mass_lst)

positions使用R的向量循环功能。首先构造长度为N*m的positions
向量，其中N是AB\u lst
中的行数，m是长度（质量）
。然后使用此向量从数据帧中选择行
请参阅下面完整的可运行示例
positions <- c()
compare_masses <- function(mass_lst){
  for (i in seq_along(mass_lst)) {
    positions <- c(positions, which(abs(AB_massLst_numeric - mass_lst[i]) < 0.02))
   }
   return(AB_lst[positions,])
}

mass_lst <- c(375, 243, 676, 121)
AB_massLst_numeric <- c(323, 474, 812, 375, 999, 271, 676, 232, 676)

AB_lst <- data.frame(x=1,y=AB_massLst_numeric)
match_df <- AB_lst[c(),]

compare_masses(mass_lst)

positions使用R的向量循环功能。首先构造长度为N*m的positions
向量，其中N是AB\u lst
中的行数，m是长度（质量）
。然后使用此向量从数据帧中选择行
请参阅下面完整的可运行示例
positions <- c()
compare_masses <- function(mass_lst){
  for (i in seq_along(mass_lst)) {
    positions <- c(positions, which(abs(AB_massLst_numeric - mass_lst[i]) < 0.02))
   }
   return(AB_lst[positions,])
}

mass_lst <- c(375, 243, 676, 121)
AB_massLst_numeric <- c(323, 474, 812, 375, 999, 271, 676, 232, 676)

AB_lst <- data.frame(x=1,y=AB_massLst_numeric)
match_df <- AB_lst[c(),]

compare_masses(mass_lst)

positions您可以循环查找所需的行索引，然后根据该数据选择行：
set.seed(1)
DF <- data.frame(x=runif(1e2), y=sample(letters, 1e2, rep=T))
LIST <- list(0, 0.2, 0.4, 0.5)
DF[unlist(lapply(LIST, function(y) which(abs(DF$x - y) < .02))), ]

请注意，我们选择的值实际上在目标的0.02范围内。
您可以循环查找所需的行索引，然后根据该数据选择行：
set.seed(1)
DF <- data.frame(x=runif(1e2), y=sample(letters, 1e2, rep=T))
LIST <- list(0, 0.2, 0.4, 0.5)
DF[unlist(lapply(LIST, function(y) which(abs(DF$x - y) < .02))), ]

请注意，我们选择的值实际上在目标的0.02范围内。
您可以循环查找所需的行索引，然后根据该数据选择行：
set.seed(1)
DF <- data.frame(x=runif(1e2), y=sample(letters, 1e2, rep=T))
LIST <- list(0, 0.2, 0.4, 0.5)
DF[unlist(lapply(LIST, function(y) which(abs(DF$x - y) < .02))), ]

请注意，我们选择的值实际上在目标的0.02范围内。
您可以循环查找所需的行索引，然后根据该数据选择行：
set.seed(1)
DF <- data.frame(x=runif(1e2), y=sample(letters, 1e2, rep=T))
LIST <- list(0, 0.2, 0.4, 0.5)
DF[unlist(lapply(LIST, function(y) which(abs(DF$x - y) < .02))), ]

请注意，我们选择的值实际上在目标值的0.02范围内。
如果这仍然很慢，dplyr:：bind_rows
可能会取代do。调用可以大幅提高速度。您需要在abs（AB_lst_numeric-x）周围安装一个<0.02
或元素回收将导致非预期行被细分为i.f.f.长度（abs（AB\u lst\u numeric-x）<0.02）
@Vlo，这是不正确的（至少对于3.2.1）。无论是否带有，该函数调用生成的数据。对于从AB_lst
的行索引之后选择的任何行，帧都将有一行NA
。我认为可以接受假设这些是相应的结构，而不是在现阶段进行严格检查。如果我们想这样做，我们必须做一些事情：AB_lst[intersect（abs（AB_lst_numeric-x）<0.02），1:nrow（AB_lst）），]
。你是说R3.2.1修补了从一开始就存在的向量循环行为？在当前的R文档中甚至有一个部分：使用长度小于data.frame/matrix的逻辑向量子集行当然不会为我返回NA。对不起。虽然你说列表在另一个方向上不匹配，向量更长。的确，这是可能的，但我认为信任相应的数据结构显然是可行的