Scikit learn sklearn cross_val_score()在使用“时返回NaN值”;r2“;作为得分

Scikit learn sklearn cross_val_score()在使用“时返回NaN值”;r2“;作为得分,scikit-learn,regression,nan,cross-validation,Scikit Learn,Regression,Nan,Cross Validation,我正在尝试使用sklearn cross\u val\u score()。以下是我尝试过的示例: # loocv evaluate random forest on the housing dataset from numpy import mean from numpy import std from numpy import absolute from pandas import read_csv from sklearn.model_selection import LeaveOneOut

我正在尝试使用sklearn cross\u val\u score()。以下是我尝试过的示例:

# loocv evaluate random forest on the housing dataset
from numpy import mean
from numpy import std
from numpy import absolute
from pandas import read_csv
from sklearn.model_selection import LeaveOneOut
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestRegressor

# load dataset
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv'
dataframe = read_csv(url, header=None)
data = dataframe.values
# split into inputs and outputs
X, y = data[:, :-1], data[:, -1]
print(X.shape, y.shape)

# create loocv procedure
cv = LeaveOneOut()
# create model
model = RandomForestRegressor(random_state=1)

# evaluate model
scores = cross_val_score(model, X, y, scoring='neg_mean_absolute_error', cv=cv, n_jobs=-1)
# force positive
scores = absolute(scores)
# report performance
print('MAE: %.3f (%.3f)' % (mean(scores), std(scores)))

上面的代码工作正常,没有任何问题。但是,当我将
scoring
更改为
r2
时,
scores
中的所有值都将变为
nan
问题在于将
LeaveOneOut()
r2
组合使用作为评分函数。将以这样的方式分割数据,即仅一个样本用于测试,其余样本用于培训。问题来了,当你用这个公式计算验证集时:

分母变为零,因为
n=1
(只有一个样本需要验证),所以
y\u bar=y\u i
因为平均值等于您拥有的一个数字,这导致您观察到的
nan
。如果您的
cv=数据点的数量
如下所示,则必然会发生这种情况:

# evaluate model
scores = cross_val_score(model, X[0:10], y[0:10], scoring='r2', cv=10, n_jobs=-1)
# force positive
scores = absolute(scores)
# report performance
print('MAE: %.3f (%.3f)' % (mean(scores), std(scores)))
MAE: nan (nan)
现在,当我为
n
设置一些其他值时,它工作正常:

# evaluate model
scores = cross_val_score(model, X[0:10], y[0:10], scoring='r2', cv=3, n_jobs=-1)
# force positive
scores = absolute(scores)
# report performance
print('MAE: %.3f (%.3f)' % (mean(scores), std(scores)))
MAE: 0.662 (0.229)