Python 列车测试分离scikit学习存在问题_Python_Machine Learning_Scikit Learn

Python 列车测试分离scikit学习存在问题

python machine-learning scikit-learn

Python 列车测试分离scikit学习存在问题,python,machine-learning,scikit-learn,Python,Machine Learning,Scikit Learn,我正在使用随机森林分类器进行以下分类任务 No. of classes = 11 Y = 50 X = 100 我用了75%的训练和25%的测试然而，当我计算混淆矩阵时，对角线值27大于25 from sklearn.metrics import confusion_matrix conf_matrix = confusion_matrix(test_Y, y_prediction) print (conf_matrix) [[17 2 0 0 0 0 0 0 0 0

我正在使用随机森林分类器进行以下分类任务

No. of classes = 11
Y = 50
X = 100

我用了75%的训练和25%的测试

然而，当我计算混淆矩阵时，对角线值27大于25

from sklearn.metrics import confusion_matrix
conf_matrix = confusion_matrix(test_Y, y_prediction)
print (conf_matrix)

[[17  2  0  0  0  0  0  0  0  0  0]
 [ 1 12  2  0  0  0  0  0  0  6  0]
 [ 0  0 22  0  0  0  0  0  2  0  0]
 [ 0  0  0 16  0 12  1  0  0  0  1]
 [ 0  1  0  0 19  0  0  0  6  0  0]
 [ 0  0  0  7  0 18  2  0  0  0  0]
 [ 0  0  0  0  0  0 20  1  0  0  0]
 [ 2  2  2  0  0  0  0 27  0  3  0]
 [ 0  0  0  0  8  1  0  0 13  0  0]
 [ 0  1  0  0  0  1  0  0  0 16  4]
 [ 0  0  0  0  0  0  0  1  0 12 14]]

当我探究原因时，列车测试拆分没有按预期进行

yy, counts = np.unique(Y, return_counts=True)
print (yy, counts)

[ 0  1  2  3  4  5  6  7  8  9 10] [100 100 100 100 100 100 100 100 100 100 100]

train_X, test_X, train_Y, test_Y  = train_test_split(X, Y, test_size=0.25, random_state=42)

但是不，在列车试验分割后，他们不都是25人；但他们必须这样。不是吗

它们不必如此，因为您没有要求分层列车/测试划分；将其更改为：

train_X, test_X, train_Y, test_Y  = train_test_split(X, Y, test_size=0.25, 
                                                     stratify=Y,
                                                     random_state=42)

啊,我不知道。没有out stratify=Y它会做什么？这是新版本的问题吗？@mk1它保持了拆分中标签的相似性；不确定这是版本问题，请检查。

train_X, test_X, train_Y, test_Y  = train_test_split(X, Y, test_size=0.25, 
                                                     stratify=Y,
                                                     random_state=42)