Python 从scikit管道提取选定的要素名称_Python_Numpy_Scikit Learn

Python 从scikit管道提取选定的要素名称

python numpy scikit-learn

Python 从scikit管道提取选定的要素名称,python,numpy,scikit-learn,Python,Numpy,Scikit Learn,以上代码基于由于我使用的是SelectFromModel，我想打印所选功能的名称（在SelectFromModel管道中），但不确定如何提取它们。一种方法是在功能名称上调用功能选择器的transform（），但它必须以示例列表的形式呈现特征名称首先，您必须从GridSearchCV中找到的最佳估计器中获取特征选择阶段 # Load dataset iris = datasets.load_iris() X, y = iris.data, iris.target rf_feature_imp

以上代码基于

由于我使用的是SelectFromModel，我想打印所选功能的名称（在SelectFromModel管道中），但不确定如何提取它们。

一种方法是在功能名称上调用功能选择器的

transform（）

，但它必须以示例列表的形式呈现特征名称

首先，您必须从

GridSearchCV

中找到的最佳估计器中获取特征选择阶段

# Load dataset
iris = datasets.load_iris()
X, y = iris.data, iris.target

rf_feature_imp = RandomForestClassifier(100)
feat_selection = SelectFromModel(rf_feature_imp, threshold=0.5)

clf = RandomForestClassifier(5000)

model = Pipeline([
          ('fs', feat_selection), 
          ('clf', clf), 
        ])

 params = {
    'fs__threshold': [0.5, 0.3, 0.7],
    'fs__estimator__max_features': ['auto', 'sqrt', 'log2'],
    'clf__max_features': ['auto', 'sqrt', 'log2'],
 }

 gs = GridSearchCV(model, params, ...)
 gs.fit(X,y)

从要素名称创建示例列表：

fs = gs.best_estimator_.named_steps['fs']

使用特征选择器变换此示例

feature_names_example = [iris.feature_names]

具有一个方法，可为选定的要素返回布尔掩码。所以你可以这样做（除了@David Maust描述的初步步骤）：

s=model.named_steps['fs'].fit（X，y）

X.columns[s.get_support（）]

此代码中的0.7 for

fs_u阈值

会导致我在scikit learn 0.17.1和Python 2.7以及加载iris数据集时出现以下错误。行

gs.fit（X，y）

产生以下错误C:\Python27\lib\site packages\sklearn\feature\u selection\base.py:80:UserWarning:未选择任何功能：数据太嘈杂或选择测试太严格。UserWarning）回溯（最近一次调用last）：ValueError:找到了具有0个功能（shape=（99，0））的数组，但至少需要1个。我发现，如果删除0.7，代码将按预期运行。看起来很奇怪，但至少它跑起来了。如果没有重要度大于0.7的特性，这将是有意义的，这也就不足为奇了。如果未给出任何

随机_状态

，则RandomForestClassifier也不是确定性的。

selected_features = fs.transform(feature_names_example)

print selected_features[0] # Select the one example
# ['sepal length (cm)' 'petal length (cm)' 'petal width (cm)']

feature_names = np.array(iris.feature_names)
selected_features = feature_names[fs.get_support()]