如何在R中加速while循环(可能使用dopar)?

如何在R中加速while循环(可能使用dopar)?,r,loops,while-loop,parallel-processing,doparallel,R,Loops,While Loop,Parallel Processing,Doparallel,我正试图处理一个包含数千万行文本的巨大文本文件。文本文件包含convnet对数百万图像的分析结果,如下所示: CUDNN_HALF=1 net.optimized_memory = 0 mini_batch = 1, batch = 8, time_steps = 1, train = 0 nms_kind: greedynms (1), beta = 0.600000 nms_kind: greedynms (1), beta = 0.600000 nms_kind: greedy

我正试图处理一个包含数千万行文本的巨大文本文件。文本文件包含convnet对数百万图像的分析结果,如下所示:

 CUDNN_HALF=1 
net.optimized_memory = 0 
mini_batch = 1, batch = 8, time_steps = 1, train = 0 
nms_kind: greedynms (1), beta = 0.600000 
nms_kind: greedynms (1), beta = 0.600000 
nms_kind: greedynms (1), beta = 0.600000 

 seen 64, trained: 447 K-images (6 Kilo-batches_64) 
Enter Image Path: data/obj1/H001683-19-1-5-OCT2 [x=13390,y=52118,w=256,h=256].png: Predicted in 19.894000 milli-seconds.
tumor: 99%  (left_x:    2   top_y:  160   width:   67   height:   34)
bcell: 98%  (left_x:    6   top_y:   54   width:   32   height:   22)
bcell: 80%  (left_x:   51   top_y:    0   width:   30   height:   16)
bcell: 98%  (left_x:   52   top_y:  198   width:   28   height:   26)
bcell: 98%  (left_x:  150   top_y:  216   width:   35   height:   23)
bcell: 56%  (left_x:  150   top_y:   78   width:   45   height:   30)
bcell: 91%  (left_x:  187   top_y:  132   width:   31   height:   26)
bcell: 96%  (left_x:  219   top_y:  185   width:   20   height:   26)
bcell: 37%  (left_x:  222   top_y:   -0   width:   24   height:    4)
bcell: 98%  (left_x:  241   top_y:  208   width:   15   height:   21)
bcell: 64%  (left_x:  248   top_y:   35   width:    8   height:   35)
 [... a lot of similar lines...] 
Enter Image Path: data/obj1/H001683-19-1-5-OCT2 [x=13390,y=52530,w=256,h=256].png: Predicted in 19.195000 milli-seconds.
bcell: 97%  (left_x:   45   top_y:  180   width:   29   height:   24)
bcell: 58%  (left_x:   59   top_y:    1   width:   35   height:   22)
tumor: 98%  (left_x:  105   top_y:  143   width:   99   height:   44)
tumor: 97%  (left_x:  113   top_y:   50   width:   57   height:   40)
bcell: 96%  (left_x:  191   top_y:  194   width:   29   height:   27)
bcell: 99%  (left_x:  201   top_y:  129   width:   34   height:   22)
Enter Image Path: 
每个图像都由图像文件名在“输入图像路径”之后提及,后面是已识别对象的列表。我不知道每个图像中有多少物体(这里是一个B细胞)。有时根本没有物体,有时有几百个。 我首先尝试使用

test11<-readLines("result.txt")
picsna<-grep(test11,pattern="Enter Image") # line numbers with the image file name
lle<-length(picsna) # length for the subsequent script
我在一个包含大约200行的小结果文件上测量了第一个和第二个脚本的运行时间。第二个脚本甚至有点慢(0.04 vs 0.01),这让我很困惑。
我想在
foreach
-
%dopar%
-循环中重写它,但无法实现如何使用
readLines
-函数或我的
while
-循环来实现它。我的问题是,我不知道该文件包含多少行。如果有人能帮我将我的脚本并行化,我将不胜感激

谢谢@Bas!我在Linux机器上测试了你的建议:对于一个大约有2.39亿行的文件,它只花了不到1分钟的时间。通过添加
>lines.txt
我可以保存结果。有趣的是,我的第一个
readLines
R脚本“只”需要29分钟,与我的第一次体验相比,这是惊人的快(因此我可能在工作中遇到了一些与R无关的Windows计算机问题)。

一个简单的命令行脚本运行多长时间?例如,
grep-hn-R“Enter Image”result.txt | cut-f1-d:
为您提供匹配的行号。如果运行速度足够快,你可以考虑在R以外的行数进行搜索。
require(LaF)
n=1 
lle<-0 # number of images (to be used in a subsequent code) 
picsna<-c() # vector with the line numbers of each image entry

# read the result-file initially (first bunch of lines do not contain image entries
test11<-get_lines(file="result.txt", line_numbers=n) 
# as long as the line exists read the next line and do following:
while(is.na(test11)==FALSE){ 
  test11<-get_lines(file="result.txt", line_numbers=n+1)
# I wanted to know how far my reading progressed but had a feeling, print slowed down the loop
  #print(n)   
# I found here this solution for printing progress periodically 
  if(n %% 10000==0) { 
     cat(paste0("iteration: ", n, "\n"))
  }
# look for image entry and save the line number (not the iteration number)
  if(grepl(test11,pattern="Enter Image")==TRUE){ 
    picsna<-c(picsna,n+1)
    lle<-lle+1} # increase the number of images
  n<-n+1 
}
# the last line of the file is always incomplete but has to be added to the vector to calculate the number of objects (in a following script not shown here) if the previous image had any.
if(is.na(test11)==TRUE){ 
  picsna<-c(picsna,n)
  print("The End")
  lle<-lle+1
}