Python 使用apply或类似工具优化R嵌套循环_Python_R_Loops_Optimization_Apply

Python 使用apply或类似工具优化R嵌套循环

python r loops optimization

Python 使用apply或类似工具优化R嵌套循环,python,r,loops,optimization,apply,Python,R,Loops,Optimization,Apply,我在这里向您展示的代码精确地计算了我想要的内容，除了一个问题：对于大型数据集，它需要的时间太长。因此，我想知道是否有使用apply（）家族或其他方式的替代解决方案我总是很难将嵌套循环重新表达为向量函数。你能帮我一下吗？我将不胜感激；）因此，在应用此嵌套循环之前，我已经： 2个数据帧，称为“数据”和“数据井”。从它们中，我只需要变量“WELL”（分类）和“DELTA”（数字） 3个全局变量，称为ti、ta和tb，将出现在嵌套循环中 “选择的_模型”，我将在函数“预测”中使用就这样。。。很抱歉

我在这里向您展示的代码精确地计算了我想要的内容，除了一个问题：对于大型数据集，它需要的时间太长。因此，我想知道是否有使用apply（）家族或其他方式的替代解决方案

我总是很难将嵌套循环重新表达为向量函数。你能帮我一下吗？我将不胜感激；）

因此，在应用此嵌套循环之前，我已经：

2个数据帧，称为“数据”和“数据井”。从它们中，我只需要变量“WELL”（分类）和“DELTA”（数字）

3个全局变量，称为ti、ta和tb，将出现在嵌套循环中

“选择的_模型”，我将在函数“预测”中使用
就这样。。。很抱歉，如果很难理解

#loop for each WELL from "DATA_100_WELLS" for (WELL_PROCESS in unique(DATA_100_WELLS$WELL)) { #---------------------------------------------------------------------------------- #I take just 1 of the wells print("WELL------------------------------------------------------------") print(WELL_PROCESS) DATA_WELL <- DATA_100_WELLS[DATA_100_WELLS$WELL==WELL_PROCESS,] #select just the well I want #I calculate some stuff (Var_est0, sigma, linf, lsup, Za, Zb, n_ray and A) DATA_WELL$Var_est0 = predict(chosen_model,data.frame(predict=DATA_WELL$predict)) DATA_WELL$sigma = sqrt(DATA_WELL$Var_est0) DATA_WELL$linf <- DATA_WELL$predict+DATA_WELL$sigma*ta DATA_WELL$lsup <- DATA_WELL$predict+DATA_WELL$sigma*tb Za <- qnorm(alfa/2) Zb <- qnorm(1-alfa/2) n_ray <- mean(DATA_WELL$predict) A = sum(DATA_WELL$Var_est0) #Then i create an empty df called "TABLE", and slice off the heading TABLE<-data.frame(well="",d=0,p=0) TABLE<-TABLE[-1,] #After that, I iterate over each WELL from the second df, "DATA" for (well in unique(DATA$WELL)){ print(paste("Process...: ",well,sep="")) #I calculate variable "large",based on max value of the existing variable "DELTA" (numeric) large = max(DATA[DATA$WELL==well,]$DELTA) #cicle from 1 max.distance (large-1) for (d in c(1:(large-1))){ #cicle from position 1 to large-distance (look how this turns to be symmetric) for (pos in (1:(large-d))){ #I did all of this to calculate variables ti and tj ti = DATA[DATA$WELL==well & DATA$DELTA==pos,]$ti tj = DATA[DATA$WELL==well & DATA$DELTA==pos+d,]$ti #I append the results into the once empty df "TABLE", and calculate p based on ti*tj TABLE<-rbind(TABLE,data.frame(well=well,d=d,p=ti*tj)) } } }

#从“数据_100_井”中为每个井循环用于（独特的井流程（数据井100井$WELL））{ #---------------------------------------------------------------------------------- #我只取一口井打印（“井---------------------------------------------------------------------------）打印（井处理）数据\u WELL如评论中所述，如果没有可复制的示例，则很难提供帮助，尽管我会尝试一下。以下更改应该会加快速度： 1）不要重复将对象绑定到自身，而是将元素插入列表中，并在循环后调用bind_行 2）将数据df子集到外环中每口井的井_df中甚至比#2更好，但没有实现，您可以在循环之前将数据分割成一个列表，这样您只需遍历一次数据我没有运行此代码 table_agg <- list() for (well in unique(DATA$WELL)){ print(paste("Process...: ",well,sep="")) #I calculate variable "large",based on max value of the existing variable "DELTA" (numeric) wells_df <- DATA[DATA$WELL==well,] large = max(wells_df$DELTA) #cicle from 1 max.distance (large-1) for (d in c(1:(large-1))){ #cicle from position 1 to large-distance (look how this turns to be symmetric) for (pos in (1:(large-d))){ #I did all of this to calculate variables ti and tj ti = wells_df[wells_df$DELTA==pos,]$ti tj = wells_df[wells_df$DELTA==pos+d,]$ti #I append the results into the once empty df "TABLE", and calculate p based on ti*tj table_agg[[length(table_agg)+1]]<-data.frame(well=well,d=d,p=ti*tj) } } } TABLE <- dplyr::bind_rows(table_agg) table_agg如评论中所述，如果没有可复制的示例，则很难提供帮助，尽管我会尝试一下。以下更改应该会加快速度： 1）不要重复将对象绑定到自身，而是将元素插入列表中，并在循环后调用bind_行 2）将数据df子集到外环中每口井的井_df中甚至比#2更好，但没有实现，您可以在循环之前将数据分割成一个列表，这样您只需遍历一次数据我没有运行此代码 table_agg <- list() for (well in unique(DATA$WELL)){ print(paste("Process...: ",well,sep="")) #I calculate variable "large",based on max value of the existing variable "DELTA" (numeric) wells_df <- DATA[DATA$WELL==well,] large = max(wells_df$DELTA) #cicle from 1 max.distance (large-1) for (d in c(1:(large-1))){ #cicle from position 1 to large-distance (look how this turns to be symmetric) for (pos in (1:(large-d))){ #I did all of this to calculate variables ti and tj ti = wells_df[wells_df$DELTA==pos,]$ti tj = wells_df[wells_df$DELTA==pos+d,]$ti #I append the results into the once empty df "TABLE", and calculate p based on ti*tj table_agg[[length(table_agg)+1]]<-data.frame(well=well,d=d,p=ti*tj) } } } TABLE <- dplyr::bind_rows(table_agg) table\u agg如果没有任何要测试的样本数据，我甚至不会尝试这样做！几点建议；不要在循环中绑定。bind命令在内存中创建副本，当数据帧的大小增加时，它会成为一个缓慢的过程。第二，不要通过data$WELL==WELL 不断检查/过滤，而是使用split函数创建一个要使用的较小数据帧列表。性能会略有提高，只需使用较小的内存并减少比较次数。最后，尝试对内部循环进行矢量化，这将是最大的性能改进。没有基准测试、评测或？我不确定人们应该如何帮助。假设apply family函数比for 循环快，这通常是错误的。这样做不会有太大的改进。如果没有任何样本数据可供测试，我甚至不会尝试这样做！几点建议；不要在循环内绑定。bind命令会在内存中创建副本，当数据丢失时，它会成为一个缓慢的过程帧的大小越来越大。其次，不要通过DATA$WELL==WELL 不断地检查/过滤，而是使用split函数创建一个要使用的较小数据帧列表。性能会略有提高，只需使用较小的内存并减少比较次数。最后，尝试对内部循环进行矢量化，这将是最大的性能性能改进。没有基准测试、评测或？我不确定人们应该如何帮助。认为应用系列函数比for 循环更快的假设通常是错误的。你不会看到这样有多大的改进。太棒了！难以置信，现在节省了这么多时间。谢谢你Patrick。谢谢大家为sharing。我下次会遵循正统。我为此道歉：）太棒了！难以置信，现在节省了这么多时间。谢谢你，帕特里克。谢谢大家分享。我下次会遵循正统。我为此道歉：）