Python 我如何查看model.feature\u importances\u输出以及特征的名称？_Python_Numpy_Scikit Learn

Python 我如何查看model.feature\u importances\u输出以及特征的名称？

python numpy scikit-learn

Python 我如何查看model.feature\u importances\u输出以及特征的名称？,python,numpy,scikit-learn,Python,Numpy,Scikit Learn,我已经用python构建了DecisionTreeClassifier模型，并希望了解每个特性的重要性。在使用sklearn时，我将所有的类都转换为数字。以下是我导入数据的方式： raw_data = pd.read_csv('Video_Games_Sales_as_at_22_Dec_2016.csv') no_na_df = raw_data.dropna(how='any') 摆脱NAs后，我创建了用于数字转换的DF： numeric_df = no_na_df.copy() cols

我已经用python构建了DecisionTreeClassifier模型，并希望了解每个特性的重要性。在使用sklearn时，我将所有的类都转换为数字。以下是我导入数据的方式：

raw_data = pd.read_csv('Video_Games_Sales_as_at_22_Dec_2016.csv')
no_na_df = raw_data.dropna(how='any')

摆脱NAs后，我创建了用于数字转换的DF：

numeric_df = no_na_df.copy()
cols = ['Platform','Genre','Publisher','Developer','Rating']
numeric_df[cols] = numeric_df[cols].apply(lambda x: pd.factorize(x)[0]+1)

完成后，我创建了测试和训练分割：

X = numeric_df.drop(['Name','Global_Sales_Bin','Global_Sales','NA_Sales','EU_Sales','JP_Sales','Other_Sales'], axis = 1)
y = numeric_df['Global_Sales_Bin']

X = np.array(X)
y = np.array(y)

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size = 0.3, random_state = 0)

运行模型等，得到结果，然后我想看看每个功能的重要性：

model.feature_importances_

哪个输出：

array([ 0.08518705,  0.07874186,  0.06322593,  0.08446309,  0.08410844,
        0.08097326,  0.07744228,  0.1851621 ,  0.23597441,  0.02472158])

我不知道如何将模型中的功能与上面的数字相匹配。“X”和“model”都存储为numpy数组，原始数据帧已被缩减以适合模型，因此功能无法正确对齐。我想我可能必须使用for循环和zip，但不确定如何使用

谢谢。

这最终起作用了

列表（zip（X\u列，model.feature\u importances））

输出：

[('Platform', 0.085187050413710552),
 ('Year_of_Release', 0.078741862224430401),
 ('Genre', 0.063225925635322172),
 ('Publisher', 0.084463091000316695),
 ('Critic_Score', 0.084108440698256848),
 ('Critic_Count', 0.080973259803115372),
 ('User_Score', 0.077442278687036153),
 ('User_Count', 0.18516210213713488),
 ('Developer', 0.23597440837370295),
 ('Rating', 0.024721581026973961)]

请看这个例子，这是关于这个确切的问题：谢谢你张贴这个，但它不能100%解决我的问题。我得到了一个“功能#”和重要性的列表，但我需要知道该功能的名称，即“发布者”、“开发人员”、“平台”等。我一直在尝试对该解决方案进行反向工程，以适应我已有的解决方案，但无法实现，我对python还是相当陌生。在该示例中，当他们打印功能重要性时，只需将

index[f]

替换为

column[index[f]]

即可，其中

column

是您发送到模型进行评估的列的列表。感谢您的帮助，但我仍然无法使其正常工作。最后我使用了这个：

list（zip（X列，model.feature\u importances））

[('Platform', 0.085187050413710552),
 ('Year_of_Release', 0.078741862224430401),
 ('Genre', 0.063225925635322172),
 ('Publisher', 0.084463091000316695),
 ('Critic_Score', 0.084108440698256848),
 ('Critic_Count', 0.080973259803115372),
 ('User_Score', 0.077442278687036153),
 ('User_Count', 0.18516210213713488),
 ('Developer', 0.23597440837370295),
 ('Rating', 0.024721581026973961)]