Machine learning 如何使用另一个估计器组合sklearn估计器?

Machine learning 如何使用另一个估计器组合sklearn估计器?,machine-learning,scikit-learn,Machine Learning,Scikit Learn,我想训练a和a,并使用a组合他们的分数: 我该如何使用它,以便将其传递给和 另外,我想我可以定义我自己的类来实现fit和predict_proba方法,但我认为应该有一个标准的方法来实现它…不,sklearn中没有内置任何东西,不需要编写一些自定义代码就可以学会做你想做的事情。您可以使用并行化代码的某些部分,并使用对整个任务进行排序,但您需要编写自定义转换器,它可以将predict_proba的输出转发到transform方法 大概是这样的: from sklearn.datasets impo

我想训练a和a,并使用a组合他们的分数:

我该如何使用它,以便将其传递给和


另外,我想我可以定义我自己的类来实现fit和predict_proba方法,但我认为应该有一个标准的方法来实现它…

不,sklearn中没有内置任何东西,不需要编写一些自定义代码就可以学会做你想做的事情。您可以使用并行化代码的某些部分,并使用对整个任务进行排序,但您需要编写自定义转换器,它可以将predict_proba的输出转发到transform方法

大概是这样的:

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.pipeline import Pipeline, FeatureUnion

X, y = make_classification(n_samples=1000, n_features=4,
                           n_informative=2, n_redundant=0,
                           random_state=0, shuffle=False)

# This is the custom transformer that will convert 
# predict_proba() to pipeline friendly transform()
class PredictProbaTransformer(BaseEstimator, TransformerMixin):
    def __init__(self, clf=None):
        self.clf = clf

    def fit(self, X, y):
        if self.clf is not None:
            self.clf.fit(X, y)

        return self

    def transform(self, X):

        if self.clf is not None:
            # Drop the 2nd column but keep 2d shape
            # because FeatureUnion wants that 
            return self.clf.predict_proba(X)[:,[0]]

        return X

    # This method is important for correct working of pipeline
    def fit_transform(self, X, y):
        return self.fit(X, y).transform(X)

logit = LogisticRegression(random_state=0)
randf = RandomForestClassifier(n_estimators=100, max_depth=2, random_state=0)

pipe = Pipeline([
                 ('stack',FeatureUnion([
                              ('logit', PredictProbaTransformer(logit)),
                              ('randf', PredictProbaTransformer(randf)),
                              #You can add more classifiers with custom wrapper like above
                                       ])),
                 ('nb',GaussianNB())])

pipe.fit(X, y)
现在你可以简单地调用pipe.predict,所有的事情都会正确完成

有关FeatureUnion的更多信息,请参阅我对类似问题的其他回答:-


如果您想进行混合/堆叠,您不应该使用保持集,即。
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.pipeline import Pipeline, FeatureUnion

X, y = make_classification(n_samples=1000, n_features=4,
                           n_informative=2, n_redundant=0,
                           random_state=0, shuffle=False)

# This is the custom transformer that will convert 
# predict_proba() to pipeline friendly transform()
class PredictProbaTransformer(BaseEstimator, TransformerMixin):
    def __init__(self, clf=None):
        self.clf = clf

    def fit(self, X, y):
        if self.clf is not None:
            self.clf.fit(X, y)

        return self

    def transform(self, X):

        if self.clf is not None:
            # Drop the 2nd column but keep 2d shape
            # because FeatureUnion wants that 
            return self.clf.predict_proba(X)[:,[0]]

        return X

    # This method is important for correct working of pipeline
    def fit_transform(self, X, y):
        return self.fit(X, y).transform(X)

logit = LogisticRegression(random_state=0)
randf = RandomForestClassifier(n_estimators=100, max_depth=2, random_state=0)

pipe = Pipeline([
                 ('stack',FeatureUnion([
                              ('logit', PredictProbaTransformer(logit)),
                              ('randf', PredictProbaTransformer(randf)),
                              #You can add more classifiers with custom wrapper like above
                                       ])),
                 ('nb',GaussianNB())])

pipe.fit(X, y)