Python 2.7 （Python-sklearn）如何通过gridsearchcv将参数传递给CustomizeModelTransformer类_Python 2.7_Machine Learning_Parameter Passing_Scikit Learn_Cross Validation

Python 2.7 （Python-sklearn）如何通过gridsearchcv将参数传递给CustomizeModelTransformer类

python-2.7 machine-learning scikit-learn

Python 2.7 （Python-sklearn）如何通过gridsearchcv将参数传递给CustomizeModelTransformer类,python-2.7,machine-learning,parameter-passing,scikit-learn,cross-validation,Python 2.7,Machine Learning,Parameter Passing,Scikit Learn,Cross Validation,下面是我的管道，我似乎无法通过使用ModelTransformer类将参数传递给我的模型，我从链接（）获取该类这个错误消息对我来说很有意义，但我不知道如何修复它。你知道怎么解决这个问题吗？谢谢 # define a pipeline pipeline = Pipeline([ ('vect', DictVectorizer(sparse=False)), ('scale', preprocessing.MinMaxScaler()), ('ess', FeatureUnion(n_jobs=-

下面是我的管道，我似乎无法通过使用ModelTransformer类将参数传递给我的模型，我从链接（）获取该类

这个错误消息对我来说很有意义，但我不知道如何修复它。你知道怎么解决这个问题吗？谢谢

# define a pipeline
pipeline = Pipeline([
('vect', DictVectorizer(sparse=False)),
('scale', preprocessing.MinMaxScaler()),
('ess', FeatureUnion(n_jobs=-1, 
                     transformer_list=[
     ('rfc', ModelTransformer(RandomForestClassifier(n_jobs=-1, random_state=1,  n_estimators=100))),
     ('svc', ModelTransformer(SVC(random_state=1))),],
                     transformer_weights=None)),
('es', EnsembleClassifier1()),
])

# define the parameters for the pipeline
parameters = {
'ess__rfc__n_estimators': (100, 200),
}

# ModelTransformer class. It takes it from the link
(http://zacstewart.com/2014/08/05/pipelines-of-featureunions-of-pipelines.html)
class ModelTransformer(TransformerMixin):
    def __init__(self, model):
        self.model = model
    def fit(self, *args, **kwargs):
        self.model.fit(*args, **kwargs)
        return self
    def transform(self, X, **transform_params):
        return DataFrame(self.model.predict(X))

grid_search = GridSearchCV(pipeline, parameters, n_jobs=-1, verbose=1, refit=True)

错误消息：

ValueError:estimator ModelTransformer的参数n_估计器无效。

GridSearchCV

对嵌套对象有特殊的命名约定。在您的例子中，

ess\uu rfc\uu n\u估计器

代表

ess.rfc.n\u估计器

，根据

管道

的定义，它指向

ModelTransformer(RandomForestClassifier(n_jobs=-1, random_state=1,  n_estimators=100)))

显然，

ModelTransformer

实例没有这样的属性

修复很简单：为了访问

ModelTransformer

的底层对象，需要使用

model

字段。因此，网格参数变得

parameters = {
  'ess__rfc__model__n_estimators': (100, 200),
}

p.S.这不是您代码的唯一问题。为了在GridSearchCV中使用多个作业，您需要使正在使用的所有对象都可以复制。这是通过实现方法

get_params

和

set_params

实现的，您可以从mixin中借用它们。

您能在这个PS上扩展一下吗？我想我也有同样的问题，当我尝试将gridsearchcv与管道功能联合使用时，我得到了一个错误AttributeError:“SelectColumns”对象没有属性“get_params”，其中SelectColumns是我为管道编写的类。@B_Miner，您应该从中继承您的

SelectColumns

类，该类提供上述

set_-params

和

get_-params

。或者，您可以实现自己的，但大多数时候您不想实现。我正在寻找BaseMixin。我是从BaseEstimator继承的，它很有魅力，谢谢@阿特姆索波列夫：我也在做同样的事情。当我尝试对同一管道使用cross_val_predict或gridsearch CV时，出现错误“无法深度复制此模式对象”。你能展示一下你是如何使用FeatureUnion的吗？谢谢你的提问——我也有同样的问题。我再问你一件事。你知道self.model.fit（*args，**kwargs）为什么起作用吗？我的意思是，在调用fit方法时，您通常不会像n_估计器那样传递超参数，但在定义类实例时，例如，rfc=RandomForestClassifier（n_估计器=100），rfc.fit（X，y）@drake，当您创建ModelTransformer实例时，您需要传递带有参数的模型。例如，ModelTransformer（RandomForestClassifier（n_作业=-1，随机状态=1，n_估计器=100）））。这里self.model.fit（*args，**kwargs）主要指self.model.fit（X，y）。谢谢，@nkhuyu。我知道这就是它的工作原理。我在问为什么。因为self.model=model，self.model=RandomForestClassifier（n_jobs=-1，random_state=1，n_估计器=100）。我理解*args正在解包（X，y），但我不理解当self.model已经知道超参数时，为什么需要在fit方法中使用**kwargs。