如何在R中加速while循环（可能使用dopar）？_R_Loops_While Loop_Parallel Processing_Doparallel

如何在R中加速while循环（可能使用dopar）？

r loops parallel-processing

如何在R中加速while循环（可能使用dopar）？,r,loops,while-loop,parallel-processing,doparallel,R,Loops,While Loop,Parallel Processing,Doparallel,我正试图处理一个包含数千万行文本的巨大文本文件。文本文件包含convnet对数百万图像的分析结果，如下所示： CUDNN_HALF=1 net.optimized_memory = 0 mini_batch = 1, batch = 8, time_steps = 1, train = 0 nms_kind: greedynms (1), beta = 0.600000 nms_kind: greedynms (1), beta = 0.600000 nms_kind: greedy

我正试图处理一个包含数千万行文本的巨大文本文件。文本文件包含convnet对数百万图像的分析结果，如下所示：

 CUDNN_HALF=1 
net.optimized_memory = 0 
mini_batch = 1, batch = 8, time_steps = 1, train = 0 
nms_kind: greedynms (1), beta = 0.600000 
nms_kind: greedynms (1), beta = 0.600000 
nms_kind: greedynms (1), beta = 0.600000 

 seen 64, trained: 447 K-images (6 Kilo-batches_64) 
Enter Image Path: data/obj1/H001683-19-1-5-OCT2 [x=13390,y=52118,w=256,h=256].png: Predicted in 19.894000 milli-seconds.
tumor: 99%  (left_x:    2   top_y:  160   width:   67   height:   34)
bcell: 98%  (left_x:    6   top_y:   54   width:   32   height:   22)
bcell: 80%  (left_x:   51   top_y:    0   width:   30   height:   16)
bcell: 98%  (left_x:   52   top_y:  198   width:   28   height:   26)
bcell: 98%  (left_x:  150   top_y:  216   width:   35   height:   23)
bcell: 56%  (left_x:  150   top_y:   78   width:   45   height:   30)
bcell: 91%  (left_x:  187   top_y:  132   width:   31   height:   26)
bcell: 96%  (left_x:  219   top_y:  185   width:   20   height:   26)
bcell: 37%  (left_x:  222   top_y:   -0   width:   24   height:    4)
bcell: 98%  (left_x:  241   top_y:  208   width:   15   height:   21)
bcell: 64%  (left_x:  248   top_y:   35   width:    8   height:   35)
 [... a lot of similar lines...] 
Enter Image Path: data/obj1/H001683-19-1-5-OCT2 [x=13390,y=52530,w=256,h=256].png: Predicted in 19.195000 milli-seconds.
bcell: 97%  (left_x:   45   top_y:  180   width:   29   height:   24)
bcell: 58%  (left_x:   59   top_y:    1   width:   35   height:   22)
tumor: 98%  (left_x:  105   top_y:  143   width:   99   height:   44)
tumor: 97%  (left_x:  113   top_y:   50   width:   57   height:   40)
bcell: 96%  (left_x:  191   top_y:  194   width:   29   height:   27)
bcell: 99%  (left_x:  201   top_y:  129   width:   34   height:   22)
Enter Image Path:

每个图像都由图像文件名在“输入图像路径”之后提及，后面是已识别对象的列表。我不知道每个图像中有多少物体（这里是一个B细胞）。有时根本没有物体，有时有几百个。我首先尝试使用

test11<-readLines("result.txt")
picsna<-grep(test11,pattern="Enter Image") # line numbers with the image file name
lle<-length(picsna) # length for the subsequent script

我在一个包含大约200行的小结果文件上测量了第一个和第二个脚本的运行时间。第二个脚本甚至有点慢（0.04 vs 0.01），这让我很困惑。

我想在

foreach

%dopar%

-循环中重写它，但无法实现如何使用

readLines

-函数或我的

while

-循环来实现它。我的问题是，我不知道该文件包含多少行。如果有人能帮我将我的脚本并行化，我将不胜感激

谢谢@Bas！我在Linux机器上测试了你的建议：对于一个大约有2.39亿行的文件，它只花了不到1分钟的时间。通过添加

>lines.txt

我可以保存结果。有趣的是，我的第一个

readLines

R脚本“只”需要29分钟，与我的第一次体验相比，这是惊人的快（因此我可能在工作中遇到了一些与R无关的Windows计算机问题）。

一个简单的命令行脚本运行多长时间？例如，

grep-hn-R“Enter Image”result.txt | cut-f1-d:

为您提供匹配的行号。如果运行速度足够快，你可以考虑在R以外的行数进行搜索。

require(LaF)
n=1 
lle<-0 # number of images (to be used in a subsequent code) 
picsna<-c() # vector with the line numbers of each image entry

# read the result-file initially (first bunch of lines do not contain image entries
test11<-get_lines(file="result.txt", line_numbers=n) 
# as long as the line exists read the next line and do following:
while(is.na(test11)==FALSE){ 
  test11<-get_lines(file="result.txt", line_numbers=n+1)
# I wanted to know how far my reading progressed but had a feeling, print slowed down the loop
  #print(n)   
# I found here this solution for printing progress periodically 
  if(n %% 10000==0) { 
     cat(paste0("iteration: ", n, "\n"))
  }
# look for image entry and save the line number (not the iteration number)
  if(grepl(test11,pattern="Enter Image")==TRUE){ 
    picsna<-c(picsna,n+1)
    lle<-lle+1} # increase the number of images
  n<-n+1 
}
# the last line of the file is always incomplete but has to be added to the vector to calculate the number of objects (in a following script not shown here) if the previous image had any.
if(is.na(test11)==TRUE){ 
  picsna<-c(picsna,n)
  print("The End")
  lle<-lle+1
}