Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/68.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 如何运行并行弯头方法找到合适的k-簇_R_Parallel Processing_K Means - Fatal编程技术网

R 如何运行并行弯头方法找到合适的k-簇

R 如何运行并行弯头方法找到合适的k-簇,r,parallel-processing,k-means,R,Parallel Processing,K Means,“data.clustering”数据框,大小:943x2 > head(data.clustering) age gender 2 2 1 3 6 2 4 2 1 5 2 1 6 6 2 7 6 1 当我使用弯头方法找到k值时: elbow.k <- function(mydata){ ## determine a "good" k using elbow dist.obj <-

“data.clustering”数据框,大小:943x2

> head(data.clustering)
  age gender
2   2      1
3   6      2
4   2      1
5   2      1
6   6      2
7   6      1
当我使用弯头方法找到k值时:

elbow.k <- function(mydata){
  ## determine a "good" k using elbow
  dist.obj <- dist(mydata);
  hclust.obj <- hclust(dist.obj);
  css.obj <- css.hclust(dist.obj,hclust.obj);
  elbow.obj <- elbow.batch(css.obj);
  #   print(elbow.obj)
  k <- elbow.obj$k
  return(k)
}

# find k value
start.time <- Sys.time();
k.clusters <- elbow.k(data.clustering);
end.time <- Sys.time();
cat('Time to find k using Elbow method is',(end.time - start.time),'seconds with k value:', k.clusters);

The time is so large: 
Time to find k using Elbow method is 24.01472 seconds with k value: 10
在R.中可以使用库(并行)包,但必须考虑使用CuultValueQuo()、CultExExtPoE()将变量和包导入到您的环境中。 我认为您的代码如下: 图书馆(平行)

# 在R.中可以使用库(并行)包,但必须考虑使用CuultValueQuo()、CultExExtPoE()将变量和包导入到您的环境中。 我认为您的代码如下: 图书馆(平行)

#
弯头.k下面是一个共享内存并行示例,它使用k-means创建弯头图

library(parallel)

elbow <- function(min_max, frame) {
  set.seed(42)
  wss <- (nrow(frame)-1)*sum(apply(frame,2,var))
  for (i in min_max) {
    wss[i] <- sum(kmeans(frame,centers=i,algorithm = c('MacQueen'))$withinss)
  }
  return(wss)
}

parallel_elbow <- function(kmax, frame_choice) {
  # create separate kmin:kmax vectors 
  cut_point <- 3
  centers_vec <- 2:kmax    
  x <- seq_along(centers_vec)
  chunks <- split(centers_vec, ceiling(x/cut_point))

  # use shared-memory parallelism on function of choice
  results <- mclapply(chunks, FUN=elbow, frame=frame_choice)

  # gather the results of each parallel run 
  no_nas <- list()
  for(i in 1:length(results)) { 
    no_nas[i] <- list(as.numeric(na.omit(results[[i]])))
  }

  vec <- unlist(no_nas)
  final_vec <- setdiff(vec, vec[1])
  final_vec <- append(vec[1],final_vec)

  # create scree plot of all wss values
  plot(1:length(final_vec), final_vec, type="b", xlab="Number of Clusters", ylab="Within groups sum of squares", pch = 16, main="Elbow Plot", col="steelblue")
}

将文档术语矩阵上的运行时与2176个文档进行比较:

system.time(elbow(1:10, dtm))
user  system elapsed 
83.130   1.450  84.843 

system.time(parallel_elbow(10, dtm))
user  system elapsed 
21.097   0.653  48.132
橙色表示正常,蓝色表示平行


下面是一个共享内存并行示例,它使用k-means创建弯头图

library(parallel)

elbow <- function(min_max, frame) {
  set.seed(42)
  wss <- (nrow(frame)-1)*sum(apply(frame,2,var))
  for (i in min_max) {
    wss[i] <- sum(kmeans(frame,centers=i,algorithm = c('MacQueen'))$withinss)
  }
  return(wss)
}

parallel_elbow <- function(kmax, frame_choice) {
  # create separate kmin:kmax vectors 
  cut_point <- 3
  centers_vec <- 2:kmax    
  x <- seq_along(centers_vec)
  chunks <- split(centers_vec, ceiling(x/cut_point))

  # use shared-memory parallelism on function of choice
  results <- mclapply(chunks, FUN=elbow, frame=frame_choice)

  # gather the results of each parallel run 
  no_nas <- list()
  for(i in 1:length(results)) { 
    no_nas[i] <- list(as.numeric(na.omit(results[[i]])))
  }

  vec <- unlist(no_nas)
  final_vec <- setdiff(vec, vec[1])
  final_vec <- append(vec[1],final_vec)

  # create scree plot of all wss values
  plot(1:length(final_vec), final_vec, type="b", xlab="Number of Clusters", ylab="Within groups sum of squares", pch = 16, main="Elbow Plot", col="steelblue")
}

将文档术语矩阵上的运行时与2176个文档进行比较:

system.time(elbow(1:10, dtm))
user  system elapsed 
83.130   1.450  84.843 

system.time(parallel_elbow(10, dtm))
user  system elapsed 
21.097   0.653  48.132
橙色表示正常,蓝色表示平行


回答不错,你能提供一些关于弯头方法的评论,以便更好地理解整个方法吗?回答不错,你能提供一些关于弯头方法的评论,以便更好地理解整个方法吗。