R并行处理错误` checkForRemoteErrors(val)中的错误:6个节点产生错误;第一个错误:下标超出范围`
我正在学习并行处理作为一种处理大型数据集的方法 我预定义了一些变量,如下所示:R并行处理错误` checkForRemoteErrors(val)中的错误:6个节点产生错误;第一个错误:下标超出范围`,r,for-loop,parallel-processing,parallel.foreach,doparallel,R,For Loop,Parallel Processing,Parallel.foreach,Doparallel,我正在学习并行处理作为一种处理大型数据集的方法 我预定义了一些变量,如下所示: CV <- function(mean, sd) {(sd / mean) * 100} distThreshold <- 5 # Distance threshold CVThreshold <- 20 # CV threshold LocalCV <- list() Num.CV <- list() 然后将集群参数clust\u cores传递给parspapply: fo
CV <- function(mean, sd) {(sd / mean) * 100}
distThreshold <- 5 # Distance threshold
CVThreshold <- 20 # CV threshold
LocalCV <- list()
Num.CV <- list()
然后将集群参数clust\u cores
传递给parspapply
:
for (i in seq(YieldData2rd)) {
LocalCV[[i]] = parSapply(clust_cores, X = 1:length(YieldData2rd[[i]]),
FUN = function(pt) {
d = spDistsN1(YieldData2rd[[i]], YieldData2rd[[i]][pt,])
ret = CV(mean = mean(YieldData2rd[[i]][d < distThreshold, ]$yield),
sd = sd(YieldData2rd[[i]][d < distThreshold, ]$yield))
return(ret)
}) # calculate CV in the local neighbour
}
stopCluster(clust_cores)
感谢@Omry Atia的评论,我开始研究
foreach
包,并进行了第一次尝试
library(foreach)
library(doParallel)
#setup parallel backend to use many processors
cores=detectCores()
clust_cores <- makeCluster(cores[1]-1) #not to overload your computer
registerDoParallel(clust_cores)
LocalCV = foreach(i = seq(YieldData2rd), .combine=list, .multicombine=TRUE) %dopar% {
LocalCV[[i]] = sapply(X = 1:length(YieldData2rd[[i]]),
FUN = function(pt) {
d = spDistsN1(YieldData2rd[[i]], YieldData2rd[[i]][pt,])
ret = CV(mean = mean(YieldData2rd[[i]][d < distThreshold, ]$yield),
sd = sd(YieldData2rd[[i]][d < distThreshold, ]$yield))
return(ret)
}) # calculate CV in the local neighbour
}
stopCluster(clust_cores)
库(foreach)
图书馆(双平行)
#设置并行后端以使用多个处理器
核心=检测核心()
clust_cores能否请您提供YieldData2rd
的样本?没有它,代码就无法运行。此外,在定义集群之前,请将i
导出到集群。请查找我编辑的问题。作为可复制示例提供的小样本数据集,在原始for
循环中运行良好。我将I
导出到集群,因为否则它会显示我无法找到对象I
。当我像上面这个修订版本那样运行代码时,我会得到一个不同的错误:checkForRemoteErrors(val)中的错误:4个节点产生错误;第一个错误:找不到对象“i”。原因是对于索引数据,parApply不适合:请查看函数foreach
,它是for循环的并行版本。希望如此helps@OmryAtia谢谢,我试着用foreach
软件包重写代码,并在下面发布了一个答案,这对我来说似乎很好。只有一个问题,我怎么知道它实际上比for
循环快?
library('rgdal')
Yield1 <- data.frame(yield=rnorm(460, mean = 10), x1=rnorm(460, mean = 1843235), x2=rnorm(460,mean = 5802532))
Yield2 <- data.frame(yield=rnorm(408, mean = 10), x1=rnorm(408, mean = 1843235), x2=rnorm(408, mean = 5802532))
Yield3 <- data.frame(yield=rnorm(369, mean = 10), x1=rnorm(369, mean = 1843235), x2=rnorm(369, mean = 5802532))
coordinates(Yield1) <- c('x1', 'x2')
coordinates(Yield2) <- c('x1', 'x2')
coordinates(Yield3) <- c('x1', 'x2')
YieldData2rd <- list(Yield1, Yield2, Yield3)
library(foreach)
library(doParallel)
#setup parallel backend to use many processors
cores=detectCores()
clust_cores <- makeCluster(cores[1]-1) #not to overload your computer
registerDoParallel(clust_cores)
LocalCV = foreach(i = seq(YieldData2rd), .combine=list, .multicombine=TRUE) %dopar% {
LocalCV[[i]] = sapply(X = 1:length(YieldData2rd[[i]]),
FUN = function(pt) {
d = spDistsN1(YieldData2rd[[i]], YieldData2rd[[i]][pt,])
ret = CV(mean = mean(YieldData2rd[[i]][d < distThreshold, ]$yield),
sd = sd(YieldData2rd[[i]][d < distThreshold, ]$yield))
return(ret)
}) # calculate CV in the local neighbour
}
stopCluster(clust_cores)