Scikit learn 管道中的python功能选择：如何确定功能名称？_Scikit Learn_Pipeline_Feature Selection

Scikit learn 管道中的python功能选择：如何确定功能名称？

scikit-learn

Scikit learn 管道中的python功能选择：如何确定功能名称？,scikit-learn,pipeline,feature-selection,Scikit Learn,Pipeline,Feature Selection,我使用管道和网格搜索来选择最佳参数，然后使用这些参数来拟合最佳管道（“最佳管道”）。但是，由于功能_选择（SelectKBest）在管道中，因此没有适用于SelectKBest的功能我需要知道“k”个选定功能的功能名称。有没有办法找回它们？先谢谢你 from sklearn import (cross_validation, feature_selection, pipeline, preprocessing, linear_model, grid_s

我使用管道和网格搜索来选择最佳参数，然后使用这些参数来拟合最佳管道（“最佳管道”）。但是，由于功能_选择（SelectKBest）在管道中，因此没有适用于SelectKBest的功能

我需要知道“k”个选定功能的功能名称。有没有办法找回它们？先谢谢你

from sklearn import (cross_validation, feature_selection, pipeline,
                     preprocessing, linear_model, grid_search)
folds = 5
split = cross_validation.StratifiedKFold(target, n_folds=folds, shuffle = False, random_state = 0)

scores = []
for k, (train, test) in enumerate(split):

    X_train, X_test, y_train, y_test = X.ix[train], X.ix[test], y.ix[train], y.ix[test]

    top_feat = feature_selection.SelectKBest()

    pipe = pipeline.Pipeline([('scaler', preprocessing.StandardScaler()),
                                 ('feat', top_feat),
                                 ('clf', linear_model.LogisticRegression())])

    K = [40, 60, 80, 100]
    C = [1.0, 0.1, 0.01, 0.001, 0.0001, 0.00001]
    penalty = ['l1', 'l2']

    param_grid = [{'feat__k': K,
                  'clf__C': C,
                  'clf__penalty': penalty}]

    scoring = 'precision'

    gs = grid_search.GridSearchCV(estimator=pipe, param_grid = param_grid, scoring = scoring)
    gs.fit(X_train, y_train)

    best_score = gs.best_score_
    scores.append(best_score)

    print "Fold: {} {} {:.4f}".format(k+1, scoring, best_score)
    print gs.best_params_

您可以在

最佳管道中按名称访问功能选择器：
features = best_pipe.named_steps['feat']

然后可以在索引数组上调用transform（）
，以获取所选列的名称：
X.columns[features.transform(np.arange(len(X.columns)))]

这里的输出将是在管道中选择的80个列名。
这可能是一个有指导意义的选择：我遇到了与OP要求的类似的需求。如果希望直接从GridSearchCV
获取k个最佳功能的索引：
finalFeatureIndices = gs.best_estimator_.named_steps["feat"].get_support(indices=True)

通过，可以获取您的最终功能列表
：
finalFeatureList = [initialFeatureList[i] for i in finalFeatureIndices]

杰克的答案完全正确。但根据您使用的功能选择器，我认为还有一个选项更干净。这一个对我有用：
X.columns[features.get_support()]

它给了我一个和杰克相同的答案。您可以在中看到更多关于它的信息，但是get\u support
返回一个数组，其中包含是否使用该列的真/假值。另外，值得注意的是，X
必须与功能选择器上使用的训练数据具有相同的形状。
从您那里获得解决方案真是一件乐事，杰克，您通过pycon教程视频帮助我学习python。然而，我得到了错误“无法将字符串转换为浮点：score575-600”（score575-600是其中一列的名称）如何解决这个问题？啊–我忘记了功能选择器对字符串不起作用。尝试上面的更新版本。很高兴听到这些视频很有帮助！仍然不确定如何避免上面的错误，但这个双步骤解决方案至少为我提供了k个最佳功能的列名：features=best\u pipe.named\u steps['feat'].get\u support（）x\u cols=x.columns.values[features==True]x\u colsGreat，更新版本可以工作！！！虽然不清楚是如何或为什么…在刷新之前发布了我的评论，所以之前没有看到更新版本。当然更喜欢这个答案，features.transform（np.arange（len（X.columns））
基本上是功能的速写。get_support（）。
X.columns[features.get_support()]