Python sklearn.exceptions.NotFittedError:估计器未拟合,在利用模型之前调用'fit'

Python sklearn.exceptions.NotFittedError:估计器未拟合,在利用模型之前调用'fit',python,machine-learning,scikit-learn,random-forest,Python,Machine Learning,Scikit Learn,Random Forest,我尝试了随机森林回归 代码如下所示 import numpy as np from sklearn.preprocessing import StandardScaler from sklearn.metrics import mean_squared_error from sklearn.model_selection import KFold, cross_val_predict from sklearn.feature_selection import SelectKBest, f_reg

我尝试了随机森林回归

代码如下所示

import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import KFold, cross_val_predict
from sklearn.feature_selection import SelectKBest, f_regression 
from sklearn.pipeline import make_pipeline, Pipeline
from sklearn.ensemble import RandomForestRegressor
from sklearn.feature_selection import RFECV
from sklearn.model_selection import GridSearchCV
np.random.seed(0)


d1 = np.random.randint(2, size=(50, 10))
d2 = np.random.randint(3, size=(50, 10))
d3 = np.random.randint(4, size=(50, 10))
Y = np.random.randint(7, size=(50,))


X = np.column_stack([d1, d2, d3])


n_smples, n_feats = X.shape
print (n_smples, n_feats)


kf = KFold(n_splits=5, shuffle=True, random_state=0)

regr = RandomForestRegressor(max_features=None,random_state=0)                
pipe = make_pipeline(RFECV(estimator=regr, step=3, cv=kf, scoring = 
'neg_mean_squared_error', n_jobs=-1),
             GridSearchCV(regr, param_grid={'n_estimators': [100, 300]},
                          cv=kf, scoring = 'neg_mean_squared_error', 
n_jobs=-1))

ypredicts = cross_val_predict(pipe, X, Y, cv=kf, n_jobs=-1)

rmse = mean_squared_error(Y, ypredicts)
print (rmse)
但是,我得到了以下错误: sklearn.exceptions.NotFittedError:估计器未拟合,请在利用模型之前调用
fit

我还尝试:

model = pipe.fit(X,Y)

ypredicts = cross_val_predict(model, X, Y, cv=kf, n_jobs=-1)
pipe.fit(X,Y)
但也犯了同样的错误

编辑1: 我还尝试:

model = pipe.fit(X,Y)

ypredicts = cross_val_predict(model, X, Y, cv=kf, n_jobs=-1)
pipe.fit(X,Y)
但也犯了同样的错误

在Python 2.7(Sklearn 0.20)中,对于相同的代码,我得到了不同的错误:

TerminatedWorkerError:由执行器管理的工作进程意外终止。这可能是由于调用函数时出现分段错误或内存使用过多导致操作系统杀死工作进程造成的。

在Python 2.7(Sklearn 0.20.3)中:
未安装错误:估计器未安装,请在使用模型之前调用
fit

而不是

model = pipe.fit(X,Y)
你试过了吗

pipe.fit(X,Y)
相反

那就是

pipe.fit(X,Y)
# change model to pipe
ypredicts = cross_val_predict(pipe, X, Y, cv=kf, n_jobs=-1)
而不是

model = pipe.fit(X,Y)
你试过了吗

pipe.fit(X,Y)
相反

那就是

pipe.fit(X,Y)
# change model to pipe
ypredicts = cross_val_predict(pipe, X, Y, cv=kf, n_jobs=-1)

似乎您正试图通过使用网格搜索为分类器选择最佳参数,这是另一种选择。您正在使用管道,但在这种方法中,我没有使用管道,而是通过随机搜索获得最佳参数

import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import KFold, cross_val_predict
from sklearn.feature_selection import SelectKBest, f_regression 
from sklearn.pipeline import make_pipeline, Pipeline
from sklearn.ensemble import RandomForestRegressor
from sklearn.feature_selection import RFECV
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier

np.random.seed(0)


d1 = np.random.randint(2, size=(50, 10))
d2 = np.random.randint(3, size=(50, 10))
d3 = np.random.randint(4, size=(50, 10))
Y = np.random.randint(7, size=(50,))


X = np.column_stack([d1, d2, d3])


n_smples, n_feats = X.shape
print (n_smples, n_feats)


kf = KFold(n_splits=5, shuffle=True, random_state=0)

regr = RandomForestRegressor(max_features=None,random_state=0)                

n_iter_search = 20
random_search = RandomizedSearchCV(regr, param_distributions={'n_estimators': [100, 300]},
                                   n_iter=20, cv=kf,verbose=1,return_train_score=True)
random_search.fit(X, Y)

ypredicts=random_search.predict(X)
rmse = mean_squared_error(Y, ypredicts)
print(rmse)
print(random_search.best_params_)
random_search.cv_results_

试试这段代码。我希望这段代码能完全解决您的问题。

似乎您正试图通过使用网格搜索为分类器选择最佳参数,这是另一种选择。您正在使用管道,但在这种方法中,我没有使用管道,而是通过随机搜索获得最佳参数

import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import KFold, cross_val_predict
from sklearn.feature_selection import SelectKBest, f_regression 
from sklearn.pipeline import make_pipeline, Pipeline
from sklearn.ensemble import RandomForestRegressor
from sklearn.feature_selection import RFECV
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier

np.random.seed(0)


d1 = np.random.randint(2, size=(50, 10))
d2 = np.random.randint(3, size=(50, 10))
d3 = np.random.randint(4, size=(50, 10))
Y = np.random.randint(7, size=(50,))


X = np.column_stack([d1, d2, d3])


n_smples, n_feats = X.shape
print (n_smples, n_feats)


kf = KFold(n_splits=5, shuffle=True, random_state=0)

regr = RandomForestRegressor(max_features=None,random_state=0)                

n_iter_search = 20
random_search = RandomizedSearchCV(regr, param_distributions={'n_estimators': [100, 300]},
                                   n_iter=20, cv=kf,verbose=1,return_train_score=True)
random_search.fit(X, Y)

ypredicts=random_search.predict(X)
rmse = mean_squared_error(Y, ypredicts)
print(rmse)
print(random_search.best_params_)
random_search.cv_results_

试试这段代码。我希望这段代码能解决您的问题。

谢谢。我知道它选择了最佳的模型参数,但它也选择了最佳的特性吗?我希望你也能回答这个问题,不!如果我们想选择最佳功能,我们必须应用其他一些数据挖掘技术来选择最佳功能。谢谢。我知道它选择了最佳的模型参数,但它也选择了最佳的特性吗?我希望你也能回答这个问题,不!如果我们想选择最佳特征,我们必须应用其他一些数据挖掘技术来选择最佳特征。