Python 如何在Scikit.learn管道中处理不平衡的xgboost多类分类？_Python_Scikit Learn_Xgboost

Python 如何在Scikit.learn管道中处理不平衡的xgboost多类分类？

python scikit-learn

Python 如何在Scikit.learn管道中处理不平衡的xgboost多类分类？,python,scikit-learn,xgboost,Python,Scikit Learn,Xgboost,我正在使用XGBClassifier建模一个不平衡的多类目标。我有几个问题： First I would like to now where should I use the parameter weight on the instantion of the classifier or on the fit step of the pipeline? Second question is how I calculate a weights. I assume that the sum of t

我正在使用XGBClassifier建模一个不平衡的多类目标。我有几个问题：

First I would like to now where should I use the parameter weight on the instantion of the classifier or on the fit step of the pipeline?

Second question is how I calculate a weights. I assume that the sum of the array should be 1.

Third: Is there any order of the weight array that maps the diferent label classes?

首先感谢大家提出的第一个问题：

我应该在哪里使用参数权重

在XGBClassifier.fit中使用样本重量

使用管道时：

顺便说一句，sklearn中的一些API不支持示例权重kwarg，例如learning曲线

所以我只是这样做：

import functools
xgb_clf.fit = functools.partial(xgb_clf.fit, sample_weight=sample_weight)

注意：您需要在网格搜索后再次进行修补，因为GridSearchCV.best_estimator_uu将不是原始估计值

关于第二个问题：

我如何计算重量。我假设数组的和应该是1

这模拟了sklearn中的类_weight='balanced'

注:

数组的和不是1。你可以将其正常化，但我认为得分结果会有所不同。这不等于类“权重=”平衡的“子样本” 我找不到一种方法来模拟这个。关于第三个问题：

有订单吗

对不起，我不明白你的意思

也许你想订购xgb\u clf.classes\ux？您可以在调用xgb_clf.fit后访问此项。

或者只使用np.uniquey\u train。

关于第一个问题的答案：当我使用pipe.fitX，y时，我的\u xgb\u clf\u sample\u weight=sample\u weight我的内核死亡。。。当我使用xgb\u clf.fit=functools.partialxgb\u clf.fit，sample\u weight=sample\u weight时，出现以下错误：NotFitteError:此ColumnTransformer实例尚未安装。在使用此方法之前，请使用适当的参数调用“fit”。@Hugo此错误是针对ColumnTransformer的。ColumnTransformer.transform在ColumnTransformer.fit之前的某个位置被调用。您是如何使用它的？预处理器=列转换器转换器=['num'，数字转换器，数字功能，'cat'，分类转换器，分类功能，]clf=Pipelinesteps=['preprocessor'，预处理器，'classifier'，XGBClassifier]@Hugo示例在我的机器Python 3.7和sklearn 0.21.2上运行良好，将最后一行更改为clf.fitX\u train、y\u train、分类器\uuuu sample\u weight=sample\u weight。我不能重现这个错误。谢谢。可能是另一种虫子。我试图找出它是否与Scikit学习有关

pipe = Pipeline([
    ('my_xgb_clf', xgb.XGBClassifier()),
])
pipe.fit(X, y, my_xgb_clf__sample_weight=sample_weight)

import functools
xgb_clf.fit = functools.partial(xgb_clf.fit, sample_weight=sample_weight)

from sklearn.utils import compute_sample_weight
sample_weight = compute_sample_weight('balanced', y_train)