R将代码转换为在共享内存中运行

R将代码转换为在共享内存中运行,r,parallel-processing,multicore,mclapply,R,Parallel Processing,Multicore,Mclapply,下面的代码有助于理解最佳集群的数量 set.seed(123) # function to compute total within-cluster sum of square wss <- function(k) { kmeans(df, k, nstart = 10 )$tot.withinss } # Compute and plot wss for k = 1 to k = 15 k.values <- 1:15 # extract wss for 2-15 cl

下面的代码有助于理解最佳集群的数量

set.seed(123)

# function to compute total within-cluster sum of square 
wss <- function(k) {
  kmeans(df, k, nstart = 10 )$tot.withinss
}

# Compute and plot wss for k = 1 to k = 15
k.values <- 1:15

# extract wss for 2-15 clusters
wss_values <- map_dbl(k.values, wss)

plot(k.values, wss_values,
       type="b", pch = 19, frame = FALSE, 
       xlab="Number of clusters K",
       ylab="Total within-clusters sum of squares")
这里i是并行启动的数量,k实际上是k,这是我们需要尝试找到最优的集群数量

k、 值使用foreach,您可以

ncores <- parallel::detectCores(logical = FALSE)
cl <- parallel::makeCluster(ncores)
doParallel::registerDoParallel(cl)
library(foreach)
wss_values2 <- foreach(k = k.values, .combine = 'c') %dopar% {
  kmeans(df, k, nstart = 10)$tot.withinss
}
parallel::stopCluster(cl)

如果将kmeans调用封装在函数中,则需要将所有变量作为参数df和k传递。

我得到了5个值作为wss_值2的结果,试图理解我现在是否执行wss_值不确定我是否理解?我在问题中发布的任何并行代码都不会导致以下情节https://uc-r.github.io/kmeans_clustering 正在尝试计算并行版本的结果。这在30000个数据集上运行了30秒,我应该得到15个cluster center值,但在wss_values2中只得到一个值。我的数据集也尝试了iris数据的代码,但仍然只得到1个值。您是否在wss_values 2中获得了15个值尝试使用df运行我的代码
Warning message:
In mclapply(c(25, 25, 25, 25), k.values, FUN = parallel.wss) :
  all scheduled cores encountered errors in user code
ncores <- parallel::detectCores(logical = FALSE)
cl <- parallel::makeCluster(ncores)
doParallel::registerDoParallel(cl)
library(foreach)
wss_values2 <- foreach(k = k.values, .combine = 'c') %dopar% {
  kmeans(df, k, nstart = 10)$tot.withinss
}
parallel::stopCluster(cl)