Python：如何在堆叠模型中生成可复制的结果_Python_Machine Learning

Python：如何在堆叠模型中生成可复制的结果

python machine-learning

Python：如何在堆叠模型中生成可复制的结果,python,machine-learning,Python,Machine Learning,经过这么多的尝试和错误，我终于建立了自己的堆叠模型。但我不能每次都做出同样的结果。我知道我必须将random_state参数初始化为任意值，但即使在调用类方法之前将random_state值显式写入某个值，我仍然会得到随机结果 class Stacking(BaseEstimator, ClassifierMixin): def __init__(self, BaseModels, MetaModel, nfolds = 3, seed = 1): self.BaseMo

经过这么多的尝试和错误，我终于建立了自己的堆叠模型。但我不能每次都做出同样的结果。我知道我必须将random_state参数初始化为任意值，但即使在调用类方法之前将random_state值显式写入某个值，我仍然会得到随机结果

class Stacking(BaseEstimator, ClassifierMixin):
    def __init__(self, BaseModels, MetaModel, nfolds = 3, seed = 1):
        self.BaseModels = BaseModels
        self.MetaModel = MetaModel
        self.nfolds = nfolds
        self.seed = np.random.seed(seed) <---- This fixed my error. thanks to foladev.

    def fit(self, X, y):
        self.BaseModels_ = [list() for model in self.BaseModels]
        self.MetaModel_ = clone(self.MetaModel)
        kf = KFold(n_splits = self.nfolds, shuffle = False, random_state = 6)
        out_of_fold_preds = np.zeros((X.shape[0], len(self.BaseModels_)))

        for index, model in enumerate(self.BaseModels_):
            for train_index, out_of_fold_index in kf.split(X, y):
                instance = clone(model)
                self.BaseModels_[index].append(instance)
                instance.fit(X[train_index], y[train_index])

                preds = instance.predict(X[out_of_fold_index])
                out_of_fold_preds[out_of_fold_index, index] = preds
                #print(model, preds, out_of_fold_preds.shape)
        self.MetaModel_.fit(out_of_fold_preds, y)
        return self

类堆叠（BaseEstimator，ClassifierMixin）：
def u uu init uu uu（self，BaseModels，MetaModel，nfolds=3，seed=1）：
self.BaseModels=BaseModels
self.MetaModel=元模型
self.nfolds=nfolds
self.seed=np.random.seed（seed）从API来看，xgbclassifier似乎使用了“seed”
xgboost.XGBClassifier(max_depth=3, learning_rate=0.1, n_estimators=100, silent=True, objective='binary:logistic', booster='gbtree', n_jobs=1, nthread=None, gamma=0, min_child_weight=1, max_delta_step=0, subsample=1, colsample_bytree=1, colsample_bylevel=1, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, base_score=0.5, random_state=0, seed=None, missing=None, **kwargs)

我可以问一下，为什么不设置类级种子并将其应用于所有方法？
您所说的类级种子是什么意思？这是我第一次做一个类方法，所以我不知道怎么做。我希望你能举个例子。另外，你能告诉我为什么即使xgboost有参数，我也会得到“\uuuu init\uuuuu（）得到一个意外的关键字参数'random\u state'”？请回答。对于类级seed，我的意思是通过“self”访问类级的'seed'变量（即random\u state）。请注意，您的方法签名始终包含一个“self”，它是为了访问类成员而传递给方法的对象本身。在本例中，您将调用self.random_seed，而不是分配一个新号码。我不确定xgboost是否有random_状态变量。api表示“种子”不是“随机”状态。至于您在“\uuuu init\uuuuu（）”中的错误，您可以发布完整的类，以便我了解您是如何子类化的吗？我没有检查，但我打赌也没有“随机状态”变量。变量名也可能是“seed”。