Python 培训数据上的偏差损失分数don'；t与clf.train\U分数相匹配__Python_Scikit Learn_Classification_Loss_Model Fitting

Python 培训数据上的偏差损失分数don'；t与clf.train\U分数相匹配_

python scikit-learn

Python 培训数据上的偏差损失分数don'；t与clf.train\U分数相匹配_,python,scikit-learn,classification,loss,model-fitting,Python,Scikit Learn,Classification,Loss,Model Fitting,TL；DR：我试图理解a的train\u score\uu属性的含义，特别是为什么它与我下面直接计算它的尝试不匹配： my_train_scores = [clf.loss_(y_train, y_pred) for y_pred in clf.staged_predict(X_train)] 更多细节：我对分类器不同拟合阶段的测试和训练数据的损失分数感兴趣。使用和损失计算测试数据的损失分数：我同意。我的问题是火车损失分数。文档建议使用clf。训练分数第i个计分序列的计分[i]是模型的偏

TL；DR：我试图理解a的

train\u score\uu

属性的含义，特别是为什么它与我下面直接计算它的尝试不匹配：

my_train_scores = [clf.loss_(y_train, y_pred) for y_pred in clf.staged_predict(X_train)]

更多细节：我对分类器不同拟合阶段的测试和训练数据的损失分数感兴趣。使用和

损失

计算测试数据的损失分数：

我同意。我的问题是火车损失分数。文档建议使用clf。训练分数

第i个计分序列的计分[i]是模型的偏差（=损失）在袋内样品的迭代i中。如果子样本==1，则为训练数据的偏差

但是这些

clf.train\u score\u值与我在上面的my\u train\u score
中直接计算它们的尝试不匹配。我错过了什么

我使用的代码是：

从sklearn.model\u选择导入列车\u测试\u分割
从sklearn.dataset导入make_hastie_10_2
从sklearn.employ导入GradientBoostingClassifier
十、 y=make_hastie_10_2（）
X_序列，X_测试，y_序列，y_测试=序列测试分割（X，y）
clf=GradientBoostingClassifier（n_估计器=5，损失=‘偏差’）
clf.fit（X_系列、y_系列）
测试分数=[clf.损失（y_测试，y_预测）中y_预测的clf.损失（y_测试，y_预测）]
打印考试成绩
打印clf.train_分数_
my_train_得分=[clf.loss_（y_train，y_pred）在clf.staged_predict（X_train）]
打印我的训练分数，属性self.train\u score\uu
按以下方式重新创建：
test_dev = []
for i, pred in enumerate(clf.staged_decision_function(X_test)):
    test_dev.append(clf.loss_(y_test, pred))

ax = plt.gca()
ax.plot(np.arange(clf.n_estimators) + 1, test_dev, color='#d7191c', label='Test', linewidth=2, alpha=0.7)
ax.plot(np.arange(clf.n_estimators) + 1, clf.train_score_, color='#2c7bb6',    label='Train', linewidth=2, alpha=0.7, linestyle='--')
ax.set_xlabel('n_estimators')
plt.legend()
plt.show()

请参见下面的结果。请注意，这些曲线相互重叠，因为训练和测试数据是相同的数据
谢谢，但我的问题是：为什么clf.train\u分数与我重新创建的分数不一样？这是否解释了你看到的差异？不，谢谢，凯文。我的问题是GradientBoostingClassifier特有的。
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_hastie_10_2
from sklearn.ensemble import GradientBoostingClassifier
X, y = make_hastie_10_2()
X_train, X_test, y_train, y_test = train_test_split(X, y)
clf = GradientBoostingClassifier(n_estimators=5, loss='deviance')
clf.fit(X_train, y_train)

test_scores = [clf.loss_(y_test, y_pred) for y_pred in clf.staged_predict(X_test)]
print test_scores
print clf.train_score_
my_train_scores = [clf.loss_(y_train, y_pred) for y_pred in clf.staged_predict(X_train)]
print my_train_scores, '<= NOT the same values as in the previous line. Why?'

[0.71319004170311229, 0.74985670836977902, 0.79319004170311214, 0.55385670836977885, 0.32652337503644546]
[ 1.369166    1.35366377  1.33780865  1.32352935  1.30866325]
[0.65541226392533436, 0.67430115281422309, 0.70807893059200089, 0.51096781948088987, 0.3078567083697788] <= NOT the same values as in the previous line. Why?

test_dev = []
for i, pred in enumerate(clf.staged_decision_function(X_test)):
    test_dev.append(clf.loss_(y_test, pred))

ax = plt.gca()
ax.plot(np.arange(clf.n_estimators) + 1, test_dev, color='#d7191c', label='Test', linewidth=2, alpha=0.7)
ax.plot(np.arange(clf.n_estimators) + 1, clf.train_score_, color='#2c7bb6',    label='Train', linewidth=2, alpha=0.7, linestyle='--')
ax.set_xlabel('n_estimators')
plt.legend()
plt.show()