Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/67.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
高尔';R中的s距离并行计算_R_Parallel Processing_Doparallel - Fatal编程技术网

高尔';R中的s距离并行计算

高尔';R中的s距离并行计算,r,parallel-processing,doparallel,R,Parallel Processing,Doparallel,我试图计算Gower在单个数据集的观测值之间的距离。我在下面的链接中找到了一个有用的函数:它包含下面使用并行计算的代码 代码如下: computeDistance <- function(dt1, dt2, nThreads = 4) { # Determine chunk-size to be processed by different threads s <- floor(nrow(dt1) / nThreads) # Setup multi-threading

我试图计算Gower在单个数据集的观测值之间的距离。我在下面的链接中找到了一个有用的函数:它包含下面使用并行计算的代码

代码如下:

computeDistance <- function(dt1, dt2, nThreads = 4) {
  # Determine chunk-size to be processed by different threads
  s <- floor(nrow(dt1) / nThreads)

  # Setup multi-threading
  modelRunner <- makeCluster(nThreads)
  registerDoParallel(modelRunner)

  # For numeric variables, build ranges (max-min) to be used in gower-distance.
  # Ensure that the ranges is computed on the overall data and not on
  # the chunks in the parallel threads. Also, note that function 'gower.dist()'
  # seems to be buggy in regards to missing values (NA), which can be fixed by
  # providing ranges for all numeric variables in the function-call

  dt <- rbind(dt1, dt2)
  rngs <- rep(NA, ncol(dt))
  for (i in 1:ncol(dt)) {
   col <- dt[[i]]
   if (is.numeric(col)) {
     rngs[i] <- max(col, na.rm = T) - min(col, na.rm = T)
   }
  }

  # Compute distance in parallel threads; note that you have to include packages
  # which must be available in the different threads
  distanceMatrix <-
    foreach(
      i = 1:nThreads, .packages = c("StatMatch"), .combine = "rbind",
      .export = "computeDistance", .inorder = TRUE
    ) %dopar% {
      # Compute chunks
      from <- (i - 1) * s + 1
      to <- i * s
      if (i == nThreads) {
        to <- nrow(dt1)
      }

      # Compute distance-matrix for each chunk
      # distanceMatrix <- daisy(dt1[from:to,],metric = "gower")
      distanceMatrix <- gower.dist(dt1[from:to,], dt2, rngs = rngs)
    }

  # Clean-up
  stopCluster(modelRunner)
  return(distanceMatrix)
}    
我尝试将线索包添加到foreach函数中的.packages参数中,但没有成功。感谢您的帮助

注意:您需要下载以下软件包才能运行该功能

library(StatMatch)
library(doParallel) 

通过使用Rcpp重写此代码,您将获得比尝试将其并行化更多的好处。@顺便说一句,ACE我对您的代码没有任何问题-它已成功运行了一年me@CPak在我的本地机器上似乎工作正常,但当我在R Studio服务器上运行代码时,出现了错误。您确定R上已安装了
StatMatch
Studio Server?我可以通过以下链接找到解决方案:
Error in e$fun(obj, substitute(ex), parent.frame(), e$data) : 
worker initialization failed: package ‘clue’ could not be loaded
library(StatMatch)
library(doParallel)