Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/83.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R NbClust包错误_R_Cluster Analysis - Fatal编程技术网

R NbClust包错误

R NbClust包错误,r,cluster-analysis,R,Cluster Analysis,我试图在我的数据(100行x 130列)上运行包NbClust,以确定应选择的群集数,但如果我试图将其应用于完整数据集,则会不断出现此错误: > nc <- NbClust(mydata, distance="euclidean", min.nc=2, max.nc=99, method="ward", index="duda") [1] "There are only 100 nonmissing observations out of a possible 100 ob

我试图在我的数据(100行x 130列)上运行包NbClust,以确定应选择的群集数,但如果我试图将其应用于完整数据集,则会不断出现此错误:

> nc <- NbClust(mydata, distance="euclidean", min.nc=2, max.nc=99, method="ward",
index="duda")     
[1] "There are only 100 nonmissing observations out of a possible 100 observations."
Error in NbClust(mydata, distance = "euclidean", min.nc = 2, max.nc = 99,  : 
The TSS matrix is indefinite. There must be too many missing values. The index cannot be calculated.

>nc在处理列多于行的矩阵时,我也遇到了同样的问题-这一问题可能会影响其他R函数,例如在尝试进行PCA分析时的princomp(在这种情况下,您应该使用prcomp)

在这种情况下,我的方法是简单地使用转置矩阵:

NbClust(t(mydata), distance="euclidean", min.nc=2, max.nc=99, method="ward", 
index="duda")

我很确定我找到了这个错误消息的原因,它本质上与数据有关。我查找了NbClust包的原始代码,发现错误源于代码的开头部分:

NbClust <- function(data, diss="NULL", distance = "euclidean", min.nc=2, max.nc=15, method = "ward", index = "all", alphaBeale = 0.1)
{
x<-0
min_nc <- min.nc
max_nc <- max.nc
jeu1 <- as.matrix(data)
numberObsBefore <- dim(jeu1)[1]
jeu <- na.omit(jeu1) # returns the object with incomplete cases removed 
nn <- numberObsAfter <- dim(jeu)[1]
pp <- dim(jeu)[2]
TT <- t(jeu)%*%jeu   
sizeEigenTT <- length(eigen(TT)$value)
eigenValues <- eigen(TT/(nn-1))$value
for (i in 1:sizeEigenTT) 
{
        if (eigenValues[i] < 0) {
    print(paste("There are only", numberObsAfter,"nonmissing observations out of a possible", numberObsBefore ,"observations."))
    stop("The TSS matrix is indefinite. There must be too many missing values. The index cannot be calculated.")
        } 
}

NbClust我不知道函数会发生什么,但是你可以通过一个循环应用不同的方法:(如果你想应用这个代码,你必须更改“base\u muli\u sinna”)


当我使用Duda索引时,这似乎是可行的,但是如果我试图从所有索引中获取集群的数量,我会再次收到一条错误消息。“求解错误。默认值(W):系统在计算上是奇异的:倒数条件数=3.65978e-17”。显然,Beale索引产生NaN的…还有,我想知道的另一件事:当计算距离矩阵时,行之间的距离是计算出来的。因此,结果不受我转置矩阵的影响吗(从那时起,它基本上是计算列之间的距离,现在已成为行)。对不起,我的输入错误,我的意思是“所有索引的簇数”在第一句中,您知道这会使您对功能进行聚类,而不是对示例进行聚类,对吗?这是一个非常危险的建议,不应该在没有上下文的情况下出现。这发生在我用10个元素的样本进行测试时。当我改为使用1000个元素时,这个错误不再发生了。base_multi_sinna应该是原始数据帧开始吗?谢谢非常有用。
lista.methods = c("kl", "ch", "hartigan","mcclain", "gamma", "gplus",
                  "tau", "dunn", "sdindex", "sdbw", "cindex", "silhouette",
                  "ball","ptbiserial", "gap","frey")
lista.distance = c("metodo","euclidean", "maximum", "manhattan", "canberra")

tabla = as.data.frame(matrix(ncol = length(lista.distance), nrow = length(lista.methods)))
names(tabla) = lista.distance

for (j in 2:length(lista.distance)){
for(i in 1:length(lista.methods)){

nb = NbClust(base_multi_sinna, distance = lista.distance[j],
             min.nc = 2, max.nc = 10, 
             method = "complete", index =lista.methods[i])
tabla[i,j] = nb$Best.nc[1]
tabla[i,1] = lista.methods[i]

}}

tabla