Python 使用集成投票分类器查找前3个特征重要性_Python_Machine Learning_Scikit Learn_Classification_Ensemble Learning

Python 使用集成投票分类器查找前3个特征重要性

python machine-learning scikit-learn

Python 使用集成投票分类器查找前3个特征重要性,python,machine-learning,scikit-learn,classification,ensemble-learning,Python,Machine Learning,Scikit Learn,Classification,Ensemble Learning,我有一个分类问题，我必须找到前3个功能使用具有PCA、xgboost、随机林的投票分类器方法，逻辑规则和决策树我是一个初学者，我不知道如何使用投票分类器来获得功能的重要性 from sklearn.linear_model import LogisticRegression from sklearn.ensemble import RandomForestClassifier from sklearn.ensemble import GradientBoostingClassifi

我有一个分类问题，我必须找到前3个功能使用具有PCA、xgboost、随机林的投票分类器方法，逻辑规则和决策树

我是一个初学者，我不知道如何使用投票分类器来获得功能的重要性

from sklearn.linear_model import LogisticRegression  
from sklearn.ensemble import RandomForestClassifier  
from sklearn.ensemble import GradientBoostingClassifier  
from sklearn.decomposition import PCA  
from sklearn.ensemble import VotingClassifier   

log_clf = LogisticRegression(random_state=2)

rnd_clf = RandomForestClassifier
(n_estimators=150, max_depth=3, min_samples_leaf=6, 
max_features=0.3, n_jobs=-1, random_state=2)

gbm_clf= GradientBoostingClassifier 
(n_estimators=150, max_depth=3, min_samples_leaf=3, max_features=0.3, 
learning_rate=0.05, subsample=0.4,random_state=2)`

estimators = [('lr', log_clf), ('rf', rnd_clf), ('gbm', gbm_clf)]

voting_clf = VotingClassifier(estimators=estimators,voting='hard')

voting_clf.fit(train.drop(['target'],1),train['target'])

例外：它应该使用具有pca、xgboost、dt、rf和lr的投票分类器为我提供变量的特征重要性。

您可以从

投票对象访问底层分类器，并提取这些分类器的特征重要性。
例如：
for alg in voting_clf.named_estimators:
    clf = voting_clf.named_estimators[alg]
    # extract feature importance for clf
    # Note different algorithms have different 
    # methods for feature importance

由于您是使用“特征重要性”这一根本不同的概念对算法进行集成，我认为没有一种定义明确的方法来确定集成结果中哪些特征最重要。
我也遇到了同样的问题，但是，罗伯特·金的方法不起作用，因为VotingRegressionor（我正在使用回归）有几个字段带有估计量，而在名为_估计量的字段中，它们没有拟合，因此无法进行特征重要性提取。您可以在第二张图片中看到其中一个命名估计器的样子


具有拟合估计量的适当域被命名为_估计量u，它如下所示：

以及获取所有重要信息的代码
    def __get_feature_importances(self, train_columns):
    feature_imp = dict()
    for est in self.model.estimators_:
        if type(est) == catboost.core.CatBoostRegressor:
            feature_imp['catboost'] = dict(zip(train_columns, est.feature_importances_))
        elif type(est) == lightgbm.sklearn.LGBMRegressor:
            feature_imp['lgbm'] = dict(zip(train_columns, est.feature_importances_))
        elif type(est) == xgboost.sklearn.XGBRegressor:
            feature_imp['xgboost'] = dict(zip(train_columns, est.feature_importances_))
    return feature_imp

我们必须按类型对它们进行比较，因为命名估计器没有名称。
我不明白您是如何使用PCA的。您已经在上面的示例代码中导入了它，但没有在任何地方使用它。PCA也是一种降维方法，而不是一种分类算法。是的，因为我不知道如何在投票分类器中使用PCA。当我遇到这个问题时，甚至我都感到困惑。所以，我只是在四处寻找。一种可能是在使用其他分类器之前使用PCA将维度降低到3，例如，请参阅此处的用户指南：但这并不是您真正要求的。这是家庭作业问题吗？你能给我们更多关于你想要达到的目标的细节吗？