Machine learning 如何在weka上解释EM的输出

Machine learning 如何在weka上解释EM的输出,machine-learning,weka,data-mining,Machine Learning,Weka,Data Mining,我试着用WEKA中的默认参数在数据上运行EM算法,但我无法理解如何解释它 === Run information === Scheme: weka.clusterers.EM -I 100 -N -1 -X 10 -max -1 -ll-cv 1.0E-6 -ll-iter 1.0E-6 -M 1.0E-6 -K 10 -num-slots 1 -S 100 Relation: Chronic_Kidney_Disease-weka.filters.unsupervised

我试着用WEKA中的默认参数在数据上运行EM算法,但我无法理解如何解释它

=== Run information === Scheme: weka.clusterers.EM -I 100 -N -1 -X 10 -max -1 -ll-cv 1.0E-6 -ll-iter 1.0E-6 -M 1.0E-6 -K 10 -num-slots 1 -S 100 Relation: Chronic_Kidney_Disease-weka.filters.unsupervised.attribute.Remove-R12-weka.filters.unsupervised.attribute.Remove-R3-weka.filters.unsupervised.attribute.Remove-R3-4-weka.filters.unsupervised.attribute.Remove-R5-10,12-20 Instances: 800 Attributes: 6 age bp rbc pc hemo class Test mode: evaluate on training data === Clustering model (full training set) === EM == Number of clusters selected by cross validation: 6 Number of iterations performed: 100 Cluster Attribute 0 1 2 3 4 5 (0.29) (0.22) (0.38) (0.02) (0.04) (0.05) =================================================================== age mean 53.5869 65.0962 46.44 51.3652 56.1297 10.939 std. dev. 12.4505 7.9718 15.546 3.7759 10.2604 6.7004 bp mean 77.3114 79.7 71.4394 115.138 92.1235 66.5196 std. dev. 11.7858 12.1008 8.4722 31.4278 5.8351 10.0583 rbc normal 185.8341 165.6585 306.8285 14.0588 7.3129 32.3071 abnormal 45.4643 13.3988 1.0652 3.3197 29.7885 6.9635 [total] 231.2984 179.0574 307.8937 17.3785 37.1015 39.2706 pc normal 152.713 147.8797 306.8886 13.0467 1.9999 31.4721 abnormal 78.5854 31.1776 1.005 4.3319 35.1016 7.7985 [total] 231.2984 179.0574 307.8937 17.3785 37.1015 39.2706 hemo mean 10.6591 11.7665 15.0745 9.5796 8.1499 12.0494 std. dev. 2.1313 1.1677 1.3496 2.5159 2.1512 1.5108 class ckd 230.1835 177.972 7.2109 16.3651 36.1014 38.167 notckd 1.1149 1.0853 300.6828 1.0134 1 1.1036 [total] 231.2984 179.0574 307.8937 17.3785 37.1015 39.2706 Time taken to build model (full training data) : 13.21 seconds === Model and evaluation on training set === Clustered Instances 0 218 ( 27%) 1 196 ( 25%) 2 302 ( 38%) 3 12 ( 2%) 4 34 ( 4%) 5 38 ( 5%) Log likelihood: -11.18988 ==运行信息=== 方案:weka.clusters.EM-i100-N-1-x10-max-1-llcv1.0E-6-lliter 1.0E-6-m1.0E-6-k10-num插槽1-s100 关系:慢性肾脏疾病-weka。过滤器。无监督。属性。移除-R12-weka。过滤器。无监督。属性。移除-R3-weka。过滤器。无监督。属性。移除-R3-4-weka。过滤器。无监督。属性。移除-R5-10,12-20 实例:800 属性:6 年龄 英国石油公司 红细胞 个人计算机 血球 班 测试模式:根据培训数据进行评估 ==聚类模型(完整训练集)=== 相对长度单位 == 交叉验证选择的群集数:6 执行的迭代次数:100 簇 属性0 1 2 3 4 5 (0.29) (0.22) (0.38) (0.02) (0.04) (0.05) =================================================================== 年龄 平均值53.5869 65.0962 46.44 51.3652 56.1297 10.939 标准偏差12.4505 7.9718 15.546 3.7759 10.2604 6.7004 英国石油公司 平均值77.3114 79.7 71.4394 115.138 92.1235 66.5196 标准偏差11.7858 12.1008 8.4722 31.4278 5.8351 10.0583 红细胞 正常值185.8341 165.6585 306.8285 14.0588 7.3129 32.3071 异常45.4643 13.3988 1.0652 3.3197 29.7885 6.9635 [总计]231.2984 179.0574 307.8937 17.3785 37.1015 39.2706 个人计算机 正常值152.713 147.8797 306.8886 13.0467 1.9999 31.4721 异常78.5854 31.1776 1.005 4.3319 35.1016 7.7985 [总计]231.2984 179.0574 307.8937 17.3785 37.1015 39.2706 血球 平均值10.6591 11.7665 15.0745 9.5796 8.1499 12.0494 标准偏差2.1313 1.1677 1.3496 2.5159 2.1512 1.5108 班 ckd 230.1835177.9727.210916.365136.101438.167 notckd 1.1149 1.0853 300.6828 1.0134 1.1036 [总计]231.2984 179.0574 307.8937 17.3785 37.1015 39.2706 构建模型所需时间(完整训练数据):13.21秒 ==训练集的模型和评估=== 集群实例 0 218 ( 27%) 1 196 ( 25%) 2 302 ( 38%) 3 12 ( 2%) 4 34 ( 4%) 5 38 ( 5%) 对数似然:-11.18988 请帮助理解输出


提前感谢

它为您提供了六个集群,分别包含27%、25%、38%、2%、4%和5%的数据。(大于100%,因此四舍五入)

在交叉验证后(对一些进行培训,对另一些进行多次测试),它于6日到达

给出了每个聚类项目的每个属性的平均值和标准偏差

对数可能性是衡量集群好坏的一个指标——培训试图将其最小化。它是用来比较哪一个可能的集群更好,而且本身并不意味着什么