如何在R中加速while循环(可能使用dopar)?
我正试图处理一个包含数千万行文本的巨大文本文件。文本文件包含convnet对数百万图像的分析结果,如下所示:如何在R中加速while循环(可能使用dopar)?,r,loops,while-loop,parallel-processing,doparallel,R,Loops,While Loop,Parallel Processing,Doparallel,我正试图处理一个包含数千万行文本的巨大文本文件。文本文件包含convnet对数百万图像的分析结果,如下所示: CUDNN_HALF=1 net.optimized_memory = 0 mini_batch = 1, batch = 8, time_steps = 1, train = 0 nms_kind: greedynms (1), beta = 0.600000 nms_kind: greedynms (1), beta = 0.600000 nms_kind: greedy
CUDNN_HALF=1
net.optimized_memory = 0
mini_batch = 1, batch = 8, time_steps = 1, train = 0
nms_kind: greedynms (1), beta = 0.600000
nms_kind: greedynms (1), beta = 0.600000
nms_kind: greedynms (1), beta = 0.600000
seen 64, trained: 447 K-images (6 Kilo-batches_64)
Enter Image Path: data/obj1/H001683-19-1-5-OCT2 [x=13390,y=52118,w=256,h=256].png: Predicted in 19.894000 milli-seconds.
tumor: 99% (left_x: 2 top_y: 160 width: 67 height: 34)
bcell: 98% (left_x: 6 top_y: 54 width: 32 height: 22)
bcell: 80% (left_x: 51 top_y: 0 width: 30 height: 16)
bcell: 98% (left_x: 52 top_y: 198 width: 28 height: 26)
bcell: 98% (left_x: 150 top_y: 216 width: 35 height: 23)
bcell: 56% (left_x: 150 top_y: 78 width: 45 height: 30)
bcell: 91% (left_x: 187 top_y: 132 width: 31 height: 26)
bcell: 96% (left_x: 219 top_y: 185 width: 20 height: 26)
bcell: 37% (left_x: 222 top_y: -0 width: 24 height: 4)
bcell: 98% (left_x: 241 top_y: 208 width: 15 height: 21)
bcell: 64% (left_x: 248 top_y: 35 width: 8 height: 35)
[... a lot of similar lines...]
Enter Image Path: data/obj1/H001683-19-1-5-OCT2 [x=13390,y=52530,w=256,h=256].png: Predicted in 19.195000 milli-seconds.
bcell: 97% (left_x: 45 top_y: 180 width: 29 height: 24)
bcell: 58% (left_x: 59 top_y: 1 width: 35 height: 22)
tumor: 98% (left_x: 105 top_y: 143 width: 99 height: 44)
tumor: 97% (left_x: 113 top_y: 50 width: 57 height: 40)
bcell: 96% (left_x: 191 top_y: 194 width: 29 height: 27)
bcell: 99% (left_x: 201 top_y: 129 width: 34 height: 22)
Enter Image Path:
每个图像都由图像文件名在“输入图像路径”之后提及,后面是已识别对象的列表。我不知道每个图像中有多少物体(这里是一个B细胞)。有时根本没有物体,有时有几百个。
我首先尝试使用
test11<-readLines("result.txt")
picsna<-grep(test11,pattern="Enter Image") # line numbers with the image file name
lle<-length(picsna) # length for the subsequent script
我在一个包含大约200行的小结果文件上测量了第一个和第二个脚本的运行时间。第二个脚本甚至有点慢(0.04 vs 0.01),这让我很困惑。
我想在
foreach
-%dopar%
-循环中重写它,但无法实现如何使用readLines
-函数或我的while
-循环来实现它。我的问题是,我不知道该文件包含多少行。如果有人能帮我将我的脚本并行化,我将不胜感激 谢谢@Bas!我在Linux机器上测试了你的建议:对于一个大约有2.39亿行的文件,它只花了不到1分钟的时间。通过添加>lines.txt
我可以保存结果。有趣的是,我的第一个readLines
R脚本“只”需要29分钟,与我的第一次体验相比,这是惊人的快(因此我可能在工作中遇到了一些与R无关的Windows计算机问题)。一个简单的命令行脚本运行多长时间?例如,grep-hn-R“Enter Image”result.txt | cut-f1-d:
为您提供匹配的行号。如果运行速度足够快,你可以考虑在R以外的行数进行搜索。
require(LaF)
n=1
lle<-0 # number of images (to be used in a subsequent code)
picsna<-c() # vector with the line numbers of each image entry
# read the result-file initially (first bunch of lines do not contain image entries
test11<-get_lines(file="result.txt", line_numbers=n)
# as long as the line exists read the next line and do following:
while(is.na(test11)==FALSE){
test11<-get_lines(file="result.txt", line_numbers=n+1)
# I wanted to know how far my reading progressed but had a feeling, print slowed down the loop
#print(n)
# I found here this solution for printing progress periodically
if(n %% 10000==0) {
cat(paste0("iteration: ", n, "\n"))
}
# look for image entry and save the line number (not the iteration number)
if(grepl(test11,pattern="Enter Image")==TRUE){
picsna<-c(picsna,n+1)
lle<-lle+1} # increase the number of images
n<-n+1
}
# the last line of the file is always incomplete but has to be added to the vector to calculate the number of objects (in a following script not shown here) if the previous image had any.
if(is.na(test11)==TRUE){
picsna<-c(picsna,n)
print("The End")
lle<-lle+1
}