在R:Segmentation fault（内核转储）警告中调用foreach循环内的可执行文件_R_Linux_Foreach_Parallel Processing

在R:Segmentation fault（内核转储）警告中调用foreach循环内的可执行文件

r linux parallel-processing

在R:Segmentation fault（内核转储）警告中调用foreach循环内的可执行文件,r,linux,foreach,parallel-processing,R,Linux,Foreach,Parallel Processing,我有一个R脚本，它执行一些计算，并将结果写入input.txt文件（在工作目录中）。然后，还是在R脚本中，我调用一个Linux可执行文件（.out文件）。它将input.txt文件作为输入，执行一些工作，并将结果输出为output.txt文件（到工作目录）它在foreach循环之外工作良好。在循环内部，我得到一个分段错误（内核转储）警告。我是说这是一个警告，因为循环继续迭代，什么也没有发生但是，由于工作目录中只有一个.input.txt和一个output.txt，我担心所有工作人员都可能共享

我有一个R脚本，它执行一些计算，并将结果写入

input.txt

文件（在工作目录中）。然后，还是在R脚本中，我调用一个Linux可执行文件（

.out

文件）。它将

input.txt

文件作为输入，执行一些工作，并将结果输出为

output.txt

文件（到工作目录）

它在foreach循环之外工作良好。在循环内部，我得到一个

分段错误（内核转储）

警告。我是说这是一个警告，因为循环继续迭代，什么也没有发生

但是，由于工作目录中只有一个

.input.txt

和一个

output.txt

，我担心所有工作人员都可能共享这些文件，并同时读取和写入它们（搞乱了脚本的其余部分）。这些担心有道理吗？有没有办法或最佳做法来应对这种情况

我无法快速轻松地检查foreach循环输出的质量。我可以根据常规循环的输出手动检查它，但这需要几个小时。

您可以试试

outfilepaths <- foreach(i=1:numCores, .export=c("input","pathTo.outfile")) %dopar% { 
    Sys.sleep(i)
    td <- tempdir()
    outfile <- tempfile("output", td, ".txt")

    #copy r file
    file.copy(pathTo.outfile, td)

    #computations
    results <- data.frame(X=rnorm(10), Y=rnorm(10))

    write.csv(results, outfile)
    return(outfile)
}

outfilepath扩展此解决方案：，现在似乎可以通过向与foreach
并行的函数添加以下代码行来工作：
# save current working directory
overall_wd = getwd()

# set working directory to be the worker's temporary working directory
temp_wd = tempdir()
setwd(temp_wd)

# copy the .exe file to the temporary working directory so that it can be found by system()
file.copy(from=paste0(overall_wd,"/my_exe_file.out"),to=paste0(temp_wd,"/my_exe_file.out"))

######################################################################
# script that creates input text file, run the exe, and scans output #
######################################################################

# set working directory back to the old one
setwd(overall_wd)

捕捉到的是：

每个工作人员都需要在自己的目录中读取和写入文件，而不是在公共共享目录中（否则所有工作人员同时读取和写入相同的文件）
需要将可执行文件复制到工人的临时工作目录
是的，这些担心是有道理的。不要在循环中写入文件。循环后导出结果。每个核心也写入相同的输出。txt？@chinsoon12是的，基本上，两个文本文件（输入和输出）是共享的。在工作目录中每个都有一个，就是这样。RoLand问题是，对可执行文件的调用发生在脚本的中间（并且在函数内）。在读回output.txt
文件时，需要做很多工作。把东西分成两个foreach循环是很困难的，我认为我们需要一个可复制的例子。