Python 使用交叉验证（scikit学习）时获取单个数据点的错误_Python_Scikit Learn_Cross Validation

Python 使用交叉验证（scikit学习）时获取单个数据点的错误

python scikit-learn

Python 使用交叉验证（scikit学习）时获取单个数据点的错误,python,scikit-learn,cross-validation,Python,Scikit Learn,Cross Validation,我正在使用交叉验证来评估我的ML模型，但现在我想研究错误的分布，也就是说，我想得到特定数据点在测试集中的平均错误 from sklearn import linear_model from sklearn.model_selection import KFold, cross_val_score X = #data points y = #output lm = linear_model.LinearRegression() kfold = KFold(n_splits=10) scor

我正在使用交叉验证来评估我的ML模型，但现在我想研究错误的分布，也就是说，我想得到特定数据点在测试集中的平均错误

from sklearn import linear_model
from sklearn.model_selection import KFold, cross_val_score

X = #data points
y = #output

lm = linear_model.LinearRegression()

kfold = KFold(n_splits=10)

scores = cross_val_score(lm, X, y, scoring='neg_mean_squared_error', cv=kfold)
rmse_scores = [np.sqrt(abs(s)) for s in scores]
print('Testing RMSE (lin reg): {:.3f}'.format(np.mean(rmse_scores)))

是否有一种简单的方法可以通过交叉验证和scikit学习获得测试集中每个数据点的单个错误（不是训练错误）？

谢谢大家!

如果我正确理解了你的问题，这应该是你要找的

kf = KFold(n_splits=3)

error = []

for train_index, val_index in kf.split(X, y):
    Xtrain, X_val = X[train_index], X[val_index]
    ytrain, y_val = y[train_index], y[val_index]

    model.fit(Xtrain, ytrain)

    pred = model.predict(X_val)

    current_error = mean_squared_error(y_val, pred) # error per iteration

    error.append(current_error)

 print(np.mean(error)) # get mean error after CV

你的意思是要找出

y-pred

和

y-true

之间的所有差异吗？是的，y-pred（对于测试用例，而不是拟合本身）和y-true的个体差异。

cross_val_predict

？@shihabsharirarkhan这正是我要寻找的。非常感谢！：）非常感谢您的回复！是的，这非常接近我想要的！经过几次调整，我终于做到了。然而，上面的@ShihabShahriarKhan指出了更简单的解决方案，使用函数

cross\u val\u predict

。