R NbClust中出错:没有足够的对象进行群集
我正试图使用R中的NbClust方法来确定聚类分析中的最佳聚类数,这一方法遵循了from一书中的方法。 但是,我收到一条错误消息,上面说: hclust(md,method=“average”)中出错:必须有n>=2个对象 集群 尽管hclust方法似乎有效。因此,我假设问题是(错误消息也说明了这一点),NbClust试图创建内部只有一个对象的组 这是我的密码:R NbClust中出错:没有足够的对象进行群集,r,grouping,cluster-analysis,hclust,R,Grouping,Cluster Analysis,Hclust,我正试图使用R中的NbClust方法来确定聚类分析中的最佳聚类数,这一方法遵循了from一书中的方法。 但是,我收到一条错误消息,上面说: hclust(md,method=“average”)中出错:必须有n>=2个对象 集群 尽管hclust方法似乎有效。因此,我假设问题是(错误消息也说明了这一点),NbClust试图创建内部只有一个对象的组 这是我的密码: mydata = read.table("PLR_2016_WM_55_5_Familienstand_aufbereite
mydata = read.table("PLR_2016_WM_55_5_Familienstand_aufbereitet.csv", skip = 0, sep = ";", header = TRUE)
mydata <- mydata[-1] # Without first line (int)
data.transformed <- t(mydata) # Transformation of matrix
data.scale <- scale(data.transformed) # Scaling of table
data.dist <- dist(data.scale) # Calculates distances between points
fit.average <- hclust(data.dist, method = "average")
plot(fit.average, hang = -1, cex = .8, main = "Average Linkage Clustering")
library(NbClust)
nc <- NbClust(data.scale, distance="euclidean",
min.nc=2, max.nc=15, method="average")
mydata=read.table(“PLR_2016_WM_55_5_Familienstand_aufberitet.csv”,skip=0,sep=“;”,header=TRUE)
mydata您的数据集中存在一些问题。
最后4行不包含数据,必须删除
mydata <- read.table("PLR_2016_WM_55_5_Familienstand_aufbereitet.csv", skip = 0, sep = ";", header = TRUE)
mydata <- mydata[1:(nrow(mydata)-4),]
mydata[,1] <- as.numeric(mydata[,1])
因此,我们从data.scale
中删除一行并将其转置:
data.scale <- t(data.scale[-72,])
输出是
[1] "Frey index : No clustering structure in this data set"
*** : The Hubert index is a graphical method of determining the number of clusters.
In the plot of Hubert index, we seek a significant knee that corresponds to a
significant increase of the value of the measure i.e the significant peak in Hubert
index second differences plot.
*** : The D index is a graphical method of determining the number of clusters.
In the plot of D index, we seek a significant knee (the significant peak in Dindex
second differences plot) that corresponds to a significant increase of the value of
the measure.
*******************************************************************
* Among all indices:
* 8 proposed 2 as the best number of clusters
* 4 proposed 3 as the best number of clusters
* 8 proposed 4 as the best number of clusters
* 1 proposed 5 as the best number of clusters
* 1 proposed 8 as the best number of clusters
* 1 proposed 11 as the best number of clusters
***** Conclusion *****
* According to the majority rule, the best number of clusters is 2
*******************************************************************
您的数据集中存在一些问题。
最后4行不包含数据,必须删除
mydata <- read.table("PLR_2016_WM_55_5_Familienstand_aufbereitet.csv", skip = 0, sep = ";", header = TRUE)
mydata <- mydata[1:(nrow(mydata)-4),]
mydata[,1] <- as.numeric(mydata[,1])
因此,我们从data.scale
中删除一行并将其转置:
data.scale <- t(data.scale[-72,])
输出是
[1] "Frey index : No clustering structure in this data set"
*** : The Hubert index is a graphical method of determining the number of clusters.
In the plot of Hubert index, we seek a significant knee that corresponds to a
significant increase of the value of the measure i.e the significant peak in Hubert
index second differences plot.
*** : The D index is a graphical method of determining the number of clusters.
In the plot of D index, we seek a significant knee (the significant peak in Dindex
second differences plot) that corresponds to a significant increase of the value of
the measure.
*******************************************************************
* Among all indices:
* 8 proposed 2 as the best number of clusters
* 4 proposed 3 as the best number of clusters
* 8 proposed 4 as the best number of clusters
* 1 proposed 5 as the best number of clusters
* 1 proposed 8 as the best number of clusters
* 1 proposed 11 as the best number of clusters
***** Conclusion *****
* According to the majority rule, the best number of clusters is 2
*******************************************************************
谢谢。你的回答帮助很大,谢谢。你的回答帮助很大。