使用R中的clusGAP函数查找群集数

使用R中的clusGAP函数查找群集数,r,cluster-analysis,hierarchical-clustering,R,Cluster Analysis,Hierarchical Clustering,您能使用clusGap功能帮我找到理想的群集数吗?此链接中有一个类似的示例: 但我想为我的案子做这件事。我的代码如下: library(cluster) df <- structure( list(Propertie = c(1,2,3,4,5,6,7,8), Latitude = c(-24.779225, -24.789635, -24.763461, -24.794394, -24.747102,-24.781307,-24.761081,-24.761084), Longitud

您能使用
clusGap
功能帮我找到理想的群集数吗?此链接中有一个类似的示例:

但我想为我的案子做这件事。我的代码如下:

library(cluster)

df <- structure(
list(Propertie = c(1,2,3,4,5,6,7,8), Latitude = c(-24.779225, -24.789635, -24.763461, -24.794394, -24.747102,-24.781307,-24.761081,-24.761084),
Longitude = c(-49.934816, -49.922324, -49.911616, -49.906262, -49.890796,-49.8875254,-49.8875254,-49.922244),
waste = c(526, 350, 526, 469, 285, 433, 456,825)),class = "data.frame", row.names = c(NA, -8L))

df<-scale(df)

hcluster = clusGap(df, FUN = hcut, K.max = 100, B = 50)
Clustering k = 1,2,..., K.max (= 100): .. Error in sil.obj[, 1:3] : incorrect number of dimensions
库(集群)

df这里的问题是您将
K.max
指定为100,但是,您的数据集中只有八个观察值。如
clusGap
文档中所述,
K.max
是 因此,在您的情况下,要考虑的最大簇数<代码> k.MAX < /代码>不能大于七。

我不清楚聚类是否适用于如此小的数据集。尽管如此,请参见下面的工作实施。我修改了R/Bioconductor软件包中的
plot\u clusgap
函数,以可视化结果

library(data.table)
library(cluster)
library(factoextra) # for hcut function

df <- data.table(properties = c(1,2,3,4,5,6,7,8),
                latitude = c(-24.779225, -24.789635, -24.763461, -24.794394, -24.747102,-24.781307,-24.761081,-24.761084),
                longitude = c(-49.934816, -49.922324, -49.911616, -49.906262, -49.890796,-49.8875254,-49.8875254,-49.922244),
                waste = c(526, 350, 526, 469, 285, 433, 456,825))

df <- scale(df)

# perform clustering, B = 500 is recommended
hcluster = clusGap(df, FUN = hcut, K.max = 7, B = 500)

# extract results
dat <- data.table(hcluster$Tab)
dat[, k := .I]

# visualize gap statistic
p <- ggplot(dat, aes(k, gap)) + geom_line() + geom_point(size = 3) +
  geom_errorbar(aes(ymax = gap + SE.sim, ymin = gap - SE.sim), width = 0.25) +
  ggtitle("Clustering Results") +
  labs(x = "Number of Clusters", y = "Gap Statistic") +
  theme(plot.title = element_text(size = 16, hjust = 0.5, face = "bold"),
        axis.title = element_text(size = 12, face = "bold"))
库(data.table)
图书馆(群集)
库(factoextra)#用于hcut函数

非常感谢你的帮助=D