R中的成对K-均值_R_K Means - Fatal编程技术网

R中的成对K-均值

R中的成对K-均值,r,k-means,R,K Means,我有一个数据集，我想应用K-means聚类进行分组。但是，我只想考虑变量对。数据集有一个类变量，因此我希望这个类变量不参与聚类，并使用它来评估算法性能我想自动执行，所以必须尝试两个变量的所有可能组合，并且只返回最好的一个我在R怎么做？您可以使用Iris数据集作为示例。欢迎使用SO！像这样的东西怎么样，拥有所有的模型（以及关于它们的一切，只有最好的组合，看看答案的底部）：如果您只想要“最佳”模型，那么在本例中就是具有最佳索引的模型（注意：我从未使用过它，所以请检查公式）比率，这里是另一个

我有一个

数据集

，我想应用

K-means聚类

进行分组。但是，我只想考虑变量对。

数据集

有一个类变量，因此我希望这个类变量不参与聚类，并使用它来评估算法性能

我想自动执行，所以必须尝试两个变量的所有可能组合，并且只返回最好的一个

我在R怎么做？

您可以使用Iris数据集作为示例。

欢迎使用SO！像这样的东西怎么样，拥有所有的模型（以及关于它们的一切，只有最好的组合，看看答案的底部）：

如果您只想要“最佳”模型，那么在本例中就是具有最佳索引的模型（注意：我从未使用过它，所以请检查公式）比率，这里是另一个循环：

# combinations
comb <- combn(names(iris[,-5]),2,simplify=FALSE)
# another list
listed_1 <- list()

library(dplyr) # external package to make it simpler
for (i in c(1:length(comb))){
  names_ <- comb[[i]]
  df <-iris[ , which(names(iris) %in% names_)]
  km <- kmeans(df,3)
  df <- data.frame(cl = km$cluster, spec =iris$Species, cnt = 1)
  df <- aggregate(df$cnt,list(cl = df$cl,spec= df$spec),sum )
  df <- df %>% group_by(spec) %>% filter(x == max(x)) 
  listed_1[[i]] <- round(sum(df$x)/nrow(iris),2)*100
  }

欢迎来到SO！像这样的东西怎么样，拥有所有的模型（以及关于它们的一切，只有最好的组合，看看答案的底部）：

如果您只想要“最佳”模型，那么在本例中就是具有最佳索引的模型（注意：我从未使用过它，所以请检查公式）比率，这里是另一个循环：

# combinations
comb <- combn(names(iris[,-5]),2,simplify=FALSE)
# another list
listed_1 <- list()

library(dplyr) # external package to make it simpler
for (i in c(1:length(comb))){
  names_ <- comb[[i]]
  df <-iris[ , which(names(iris) %in% names_)]
  km <- kmeans(df,3)
  df <- data.frame(cl = km$cluster, spec =iris$Species, cnt = 1)
  df <- aggregate(df$cnt,list(cl = df$cl,spec= df$spec),sum )
  df <- df %>% group_by(spec) %>% filter(x == max(x)) 
  listed_1[[i]] <- round(sum(df$x)/nrow(iris),2)*100
  }

非常感谢。我如何根据属于同一类的实例数对K-means执行进行排序（它的纯度）？对不起，我不理解这个问题，你能详细说明一下吗（可能告诉你想要的顺序）？我想要的是使用纯度度量（）对模型进行排序。还有一个问题。我不明白这行中的

do.call

的意思

cbind（do.call（rbind，列表1），do.call（rbind，comb））

谢谢。你的回答真的很有帮助。谢谢。我如何根据属于同一类的实例数对K-means执行进行排序（它的纯度）？对不起，我不理解这个问题，你能详细说明一下吗（可能告诉你想要的顺序）？我想要的是使用纯度度量（）对模型进行排序。还有一个问题。我不明白这行中的

do.call

的意思

cbind（do.call（rbind，列表1），do.call（rbind，comb））

谢谢。你的回答真的很有帮助。

listed[[2]]
K-means clustering with 3 clusters of sizes 51, 58, 41

Cluster means:
  Sepal.Length Petal.Length
1     5.007843     1.492157
2     5.874138     4.393103
3     6.839024     5.678049

Clustering vector:
  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 2 3 2 2 2 2 2 2 2 2 2 2 2 2
 [66] 2 2 2 2 2 2 2 2 2 2 2 3 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 3 2 3 3 3 3 2 3 3 3 3 3 3 2 2 3 3 3 3 2 3 2 3 2 3 3 2 2 3 3
[131] 3 3 3 3 3 3 3 3 2 3 3 3 2 3 3 3 2 3 3 2

Within cluster sum of squares by cluster:
[1]  9.893725 23.508448 20.407805
 (between_SS / total_SS =  90.5 %)

Available components:

[1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss" "betweenss"    "size"         "iter"        
[9] "ifault"

# combinations
comb <- combn(names(iris[,-5]),2,simplify=FALSE)
# another list
listed_1 <- list()

library(dplyr) # external package to make it simpler
for (i in c(1:length(comb))){
  names_ <- comb[[i]]
  df <-iris[ , which(names(iris) %in% names_)]
  km <- kmeans(df,3)
  df <- data.frame(cl = km$cluster, spec =iris$Species, cnt = 1)
  df <- aggregate(df$cnt,list(cl = df$cl,spec= df$spec),sum )
  df <- df %>% group_by(spec) %>% filter(x == max(x)) 
  listed_1[[i]] <- round(sum(df$x)/nrow(iris),2)*100
  }

res <- cbind(do.call(rbind, listed_1),do.call(rbind, comb))
res[which.max(res[,1]),]
[1] "95"           "Petal.Length" "Petal.Width"