使用R中的clusGAP函数查找群集数
您能使用使用R中的clusGAP函数查找群集数,r,cluster-analysis,hierarchical-clustering,R,Cluster Analysis,Hierarchical Clustering,您能使用clusGap功能帮我找到理想的群集数吗?此链接中有一个类似的示例: 但我想为我的案子做这件事。我的代码如下: library(cluster) df <- structure( list(Propertie = c(1,2,3,4,5,6,7,8), Latitude = c(-24.779225, -24.789635, -24.763461, -24.794394, -24.747102,-24.781307,-24.761081,-24.761084), Longitud
clusGap
功能帮我找到理想的群集数吗?此链接中有一个类似的示例:
但我想为我的案子做这件事。我的代码如下:
library(cluster)
df <- structure(
list(Propertie = c(1,2,3,4,5,6,7,8), Latitude = c(-24.779225, -24.789635, -24.763461, -24.794394, -24.747102,-24.781307,-24.761081,-24.761084),
Longitude = c(-49.934816, -49.922324, -49.911616, -49.906262, -49.890796,-49.8875254,-49.8875254,-49.922244),
waste = c(526, 350, 526, 469, 285, 433, 456,825)),class = "data.frame", row.names = c(NA, -8L))
df<-scale(df)
hcluster = clusGap(df, FUN = hcut, K.max = 100, B = 50)
Clustering k = 1,2,..., K.max (= 100): .. Error in sil.obj[, 1:3] : incorrect number of dimensions
库(集群)
df这里的问题是您将K.max
指定为100,但是,您的数据集中只有八个观察值。如clusGap
文档中所述,K.max
是
因此,在您的情况下,要考虑的最大簇数<代码> k.MAX < /代码>不能大于七。
我不清楚聚类是否适用于如此小的数据集。尽管如此,请参见下面的工作实施。我修改了R/Bioconductor软件包中的plot\u clusgap
函数,以可视化结果
library(data.table)
library(cluster)
library(factoextra) # for hcut function
df <- data.table(properties = c(1,2,3,4,5,6,7,8),
latitude = c(-24.779225, -24.789635, -24.763461, -24.794394, -24.747102,-24.781307,-24.761081,-24.761084),
longitude = c(-49.934816, -49.922324, -49.911616, -49.906262, -49.890796,-49.8875254,-49.8875254,-49.922244),
waste = c(526, 350, 526, 469, 285, 433, 456,825))
df <- scale(df)
# perform clustering, B = 500 is recommended
hcluster = clusGap(df, FUN = hcut, K.max = 7, B = 500)
# extract results
dat <- data.table(hcluster$Tab)
dat[, k := .I]
# visualize gap statistic
p <- ggplot(dat, aes(k, gap)) + geom_line() + geom_point(size = 3) +
geom_errorbar(aes(ymax = gap + SE.sim, ymin = gap - SE.sim), width = 0.25) +
ggtitle("Clustering Results") +
labs(x = "Number of Clusters", y = "Gap Statistic") +
theme(plot.title = element_text(size = 16, hjust = 0.5, face = "bold"),
axis.title = element_text(size = 12, face = "bold"))
库(data.table)
图书馆(群集)
库(factoextra)#用于hcut函数
非常感谢你的帮助=D