R 使用'时出错;mclust';混合模型估计软件包
当尝试执行单变量混合模型估计时,R 使用'时出错;mclust';混合模型估计软件包,r,eda,R,Eda,当尝试执行单变量混合模型估计时,Rpackagemclust产生以下输出,并带有错误: ---------------------------------------------------- Gaussian finite mixture model fitted by EM algorithm ---------------------------------------------------- Mclust V (univariate, unequal variance) model
R
packagemclust
产生以下输出,并带有错误:
----------------------------------------------------
Gaussian finite mixture model fitted by EM algorithm
----------------------------------------------------
Mclust V (univariate, unequal variance) model with 9 components:
log.likelihood n df BIC ICL
-25576.3596860173 53940 26.0000000000001 -51436.0056895487 -84761.892259584
Clustering table:
1 2 3 4 5 6 7 8 9
3081 8923 8177 3017 8742 6568 10378 3650 1404
Error in object[as.character(G), modelNames, drop = FALSE] :
incorrect number of dimensions
可复制示例:
library(mclust)
set.seed(12345) # for reproducibility
data(diamonds, package='ggplot2') # use built-in data
myData <- log10(diamonds$price)
# determine number of components
mc <- Mclust(myData)
print(summary(mc))
> str(mc)
List of 15
$ call : language Mclust(data = myData)
$ data : num [1:53940, 1] 2.51 2.51 2.51 2.52 2.53 ...
$ modelName : chr "V"
$ n : int 53940
$ d : num 1
$ G : int 9
$ BIC : num [1:9, 1:2] -64689 -57330 -54802 -52807 -52721 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:9] "1" "2" "3" "4" ...
.. ..$ : chr [1:2] "E" "V"
..- attr(*, "G")= num [1:9] 1 2 3 4 5 6 7 8 9
..- attr(*, "modelNames")= chr [1:2] "E" "V"
..- attr(*, "oneD")= logi TRUE
$ bic : num -51436
$ loglik : num -25576
$ df : num 26
$ hypvol : num NA
$ parameters :List of 4
..$ Vinv : NULL
..$ pro : num [1:9] 0.0606 0.1558 0.1628 0.0411 0.1655 ...
..$ mean : Named num [1:9] 2.7 2.86 3.03 3.24 3.39 ...
.. ..- attr(*, "names")= chr [1:9] "1" "2" "3" "4" ...
..$ variance:List of 5
.. ..$ modelName: chr "V"
.. ..$ d : num 1
.. ..$ G : int 9
.. ..$ sigmasq : num [1:9] 0.004915 0.006932 0.009595 0.000833 0.008896 ...
.. ..$ scale : num [1:9] 0.004915 0.006932 0.009595 0.000833 0.008896 ...
$ classification: num [1:53940] 1 1 1 1 1 1 1 1 1 1 ...
$ uncertainty : num [1:53940] 0.0132 0.0132 0.0134 0.015 0.0152 ...
$ z : num [1:53940, 1:9] 0.987 0.987 0.987 0.985 0.985 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : NULL
- attr(*, "class")= chr "Mclust"
> sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=C
[4] LC_COLLATE=C LC_MONETARY=C LC_MESSAGES=C
[7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] rebmix_2.6.1 ddst_1.03 evd_2.3-0 orthopolynom_1.0-5
[5] polynom_1.3-8 mixtools_1.0.2 segmented_0.4-0.0 boot_1.3-11
[9] mclust_4.3 MASS_7.3-34
loaded via a namespace (and not attached):
[1] Rcpp_0.11.1 colorspace_1.2-4 digest_0.6.4 ggplot2_1.0.0
[5] grid_3.1.1 gtable_0.1.2 munsell_0.4.2 plyr_1.8.1
[9] proto_0.3-10 reshape2_1.4 scales_0.2.4 stringr_0.6.2
[13] tools_3.1.1
我已经弄明白问题出在哪里了。错误消息是由另一个
mclust
函数mclustModel()
生成的,我的代码在调用mclust()
后一直在执行该函数。该错误是因为mclustModel()
函数需要另一种类型的对象作为第二个参数传递。使用mclustModel()
(而不是Mclust()
)时,正确的调用顺序如下所示
mc <- mclustBIC(myData)
print(summary(mc, myData, parameters = TRUE))
plot(mc)
bestModel <- mclustModel(myData, mc)
print(summary(bestModel, myData))
mc我很惊讶这个问题是从交叉验证(CV)迁移过来的,因为问题不太可能与R语言或mclust
包本身有关,但可能与数据或数据的统计方面有关。奇怪的是,我在简历上没有看到任何人在这方面的注释/评论。另一个问题,可能值得作为一个单独的问题发布,是如何自动确定混合物成分的最小数量,负责任意大份额(比如95%)的分布。我在mclust
包中没有看到这样的截止参数。