Scikit learn 在管道中获取带有分类功能的Xgboost时出错_Scikit Learn_Pipeline_Xgboost_Categorical Data

Scikit learn 在管道中获取带有分类功能的Xgboost时出错

scikit-learn

Scikit learn 在管道中获取带有分类功能的Xgboost时出错,scikit-learn,pipeline,xgboost,categorical-data,Scikit Learn,Pipeline,Xgboost,Categorical Data,我通过一个管道运行xgboost，我有许多分类功能，在管道中我使用了一种热编码，但最后还是出现了一个错误，即“ValueError:DataFrame.dtypes for data必须是int、float或bool。如果onehot编码器已经将分类特征转换为数字，为什么会出现此错误 # selecting nuemrical features numeric_features = X_train.select_dtypes(include=np.number).columns # sele

我通过一个管道运行xgboost，我有许多分类功能，在管道中我使用了一种热编码，但最后还是出现了一个错误，即“ValueError:DataFrame.dtypes for data必须是int、float或bool。 如果onehot编码器已经将分类特征转换为数字，为什么会出现此错误

# selecting nuemrical features
numeric_features = X_train.select_dtypes(include=np.number).columns

# selecting categorical features
categorical_features = X_train.select_dtypes(exclude=np.number).columns

# scaling pipeline for numerical features
numeric_transformer = Pipeline(steps=[('imputer', SimpleImputer(strategy='median')),  
                                      ('scaler', StandardScaler())])                 

# scaling and encoding pipeline for categorical features
categorical_transformer = Pipeline(steps=[('imputer', SimpleImputer(strategy='constant', fill_value='Missing')), 
                                         ('onehot', OneHotEncoder(handle_unknown='ignore'))])   

#combine the preprocessing steps into a single pipeline
preprocessor = ColumnTransformer(transformers=[('num', numeric_transformer, numeric_features),
                                               ('cat', categorical_transformer, categorical_features)])

# setting up the pipeline
pipe = Pipeline(steps=[('preprocessor', preprocessor),
                  ('xgb', XGBClassifier(random_state=10))])

param_grid = {
             "xgb__n_estimators": [100, 500, 700],
             "xgb__learning_rate": [0.001, 0.1, 0.5, 1],
             "xgb__max_depth" : [4, 5],
             "xgb__alpha": [0, 0.25, 0.5, 0.75, 1],
             "xgb__lambda": [0, 0.2, 0.4, 0.6, 0.8, 1]
             }

fit_param = {"xgb__eval_set": [(X_test, y_test)], 
             "xgb__early_stopping_rounds": 10, 
             "xgb__verbose": False} 

xgbmodel = GridSearchCV(pipe, cv=5, param_grid=param_grid, scoring='accuracy')
xgbmodel.fit(X_train, y_train, **fit_params)  

print(xgbmodel.best_params_)