Python xgb.cv';当'colsample_bytree'不是1时,s auc分数与cross_val_分数不匹配

Python xgb.cv';当'colsample_bytree'不是1时,s auc分数与cross_val_分数不匹配,python,cross-validation,xgboost,k-fold,xgbclassifier,Python,Cross Validation,Xgboost,K Fold,Xgbclassifier,我正在处理高度不平衡的数据集。在超参数调优过程中,我发现如果colssample_bytree设置为除1以外的值,则sklearn软件包中的cross_val_得分与从xgb.cv获得的auc得分不匹配 xgb.cv代码: # creating kfolds kfolds = StratifiedKFold(n_splits = 5, shuffle = True, random_state = 16) # creating model object and using it for xgb.

我正在处理高度不平衡的数据集。在超参数调优过程中,我发现如果
colssample_bytree
设置为除1以外的值,则sklearn软件包中的
cross_val_得分
与从
xgb.cv
获得的auc得分不匹配

xgb.cv代码:

# creating kfolds
kfolds = StratifiedKFold(n_splits = 5, shuffle = True, random_state = 16)

# creating model object and using it for xgb.cv
xgb0 = XGBClassifier(objective= 'binary:logistic', n_estimators =2, colsample_bytree = 0.6, 
random_state =16, n_jobs = -1, eval_metric = 'auc')
params = xgb0.get_params()
xg_train = xgb.DMatrix(X_train_p.values, label = y_train.values)
cv_result = xgb.cv(params, xg_train, num_boost_round=2, folds = kfolds, metrics = 'auc', early_stopping_rounds = 50, 
                   as_pandas = True, seed = 16,stratified=True, shuffle = True)
print(cv_result['test-auc-mean'].values[-1])
cv_score = cross_val_score(xgb0, X_train_p, y_train, cv = kfolds, n_jobs = -1, scoring = 'roc_auc')
cv_score.mean()
这导致测试auc平均值为0.91706

交叉评分代码:

# creating kfolds
kfolds = StratifiedKFold(n_splits = 5, shuffle = True, random_state = 16)

# creating model object and using it for xgb.cv
xgb0 = XGBClassifier(objective= 'binary:logistic', n_estimators =2, colsample_bytree = 0.6, 
random_state =16, n_jobs = -1, eval_metric = 'auc')
params = xgb0.get_params()
xg_train = xgb.DMatrix(X_train_p.values, label = y_train.values)
cv_result = xgb.cv(params, xg_train, num_boost_round=2, folds = kfolds, metrics = 'auc', early_stopping_rounds = 50, 
                   as_pandas = True, seed = 16,stratified=True, shuffle = True)
print(cv_result['test-auc-mean'].values[-1])
cv_score = cross_val_score(xgb0, X_train_p, y_train, cv = kfolds, n_jobs = -1, scoring = 'roc_auc')
cv_score.mean()
这导致测试auc平均值为0.8994

我不理解这两者之间的巨大差异,正如我已经指出的,如果
colsample\u bytree
设置为1,则分数之间没有差异。此外,
auc
得分之间的差异随着我们减少
colsample\u bytree
而显著增加

有人能帮我理解为什么会这样吗。谢谢