Python 从Sklearn管道中提取具有特征名称的特征重要性_Python_Python 3.x_Scikit Learn_Pipeline_Random Forest

Python 从Sklearn管道中提取具有特征名称的特征重要性

python python-3.x scikit-learn

Python 从Sklearn管道中提取具有特征名称的特征重要性,python,python-3.x,scikit-learn,pipeline,random-forest,Python,Python 3.x,Scikit Learn,Pipeline,Random Forest,我想知道，在进行预处理的管道中使用分类器时，如何使用特征名称从scikit learn中的随机林中提取特征重要性这里的问题涉及仅提取特征重要性：从我所做的简短研究来看，这在scikit learn中似乎不可能实现，但我希望我是错的我还发现了一个名为ELI5（）的包，该包应该可以通过scikit learn解决这个问题，但它没有解决我的问题，因为为我输出的功能的名称是x1、x2等，而不是实际的功能名称作为一种解决方法，我在管道之外完成了所有预处理，但我很想知道如何在管道中完成如果我能提供

我想知道，在进行预处理的管道中使用分类器时，如何使用特征名称从scikit learn中的随机林中提取特征重要性
这里的问题涉及仅提取特征重要性：
从我所做的简短研究来看，这在scikit learn中似乎不可能实现，但我希望我是错的
我还发现了一个名为ELI5（）的包，该包应该可以通过scikit learn解决这个问题，但它没有解决我的问题，因为为我输出的功能的名称是x1、x2等，而不是实际的功能名称
作为一种解决方法，我在管道之外完成了所有预处理，但我很想知道如何在管道中完成

如果我能提供任何有用的代码，请在评论中告诉我
有一个Xgboost用于获取功能重要性的示例：

num_transformer = Pipeline(steps=[ ('imputer', SimpleImputer(strategy='median')), ('scaler', preprocessing.RobustScaler())]) cat_transformer = Pipeline(steps=[ ('imputer', SimpleImputer(strategy='most_frequent')), ('onehot', preprocessing.OneHotEncoder(categories='auto', sparse=False, handle_unknown='ignore'))]) from sklearn.compose import ColumnTransformer numerical_columns = X.columns[X.dtypes != 'category'].tolist() categorical_columns = X.columns[X.dtypes == 'category'].tolist() pipeline_procesado = ColumnTransformer(transformers=[ ('numerical_preprocessing', num_transformer, numerical_columns), ('categorical_preprocessing', cat_transformer, categorical_columns)], remainder='passthrough', verbose=True) # Create the classifier classifier = XGBClassifier() # Create the overall model as a single pipeline pipeline = Pipeline([("transform_inputs", pipeline_procesado), ("classifier", classifier)]) pipeline.fit(X_train, y_train) onehot_columns = pipeline.named_steps['transform_inputs'].named_transformers_['categorical_preprocessing'].named_steps['onehot'].get_feature_names(input_features=categorical_columns) #you can get the values transformed with your pipeline X_values = pipeline_procesado.fit_transform(X_train) df_from_array_pipeline = pd.DataFrame(X_values, columns = numerical_columns + list(onehot_columns) ) feature_importance = pd.Series(data= pipeline.named_steps['classifier'].feature_importances_, index = np.array(numerical_columns + list(onehot_columns)))

我想这真的取决于你所说的预处理。。。您可以指定吗？从文档中可以看出，功能名称选项适用于某些功能。希望它有助于显示您正在使用的代码，并希望将其转换为管道。