Python 使用BaggingClassifier时打印决策树和特征重要性_Python_Machine Learning_Scikit Learn_Classification_Decision Tree

Python 使用BaggingClassifier时打印决策树和特征重要性

python machine-learning scikit-learn

Python 使用BaggingClassifier时打印决策树和特征重要性,python,machine-learning,scikit-learn,classification,decision-tree,Python,Machine Learning,Scikit Learn,Classification,Decision Tree,在scikit learn中使用DecisionTreeClassifier时，可以轻松获得决策树和重要特征。但是，如果我使用了bagging函数，例如bagging分类器，则我无法获得其中任何一个因为我们需要使用BaggingClassifier来拟合模型，所以我无法返回与DecisionTreeClassifier相关的结果（打印树（图）、特征重要性…）这是我的剧本： seed = 7 n_iterations = 199 DTC = DecisionTreeClassifier(ran

在scikit learn中使用DecisionTreeClassifier时，可以轻松获得决策树和重要特征。但是，如果我使用了bagging函数，例如bagging分类器，则我无法获得其中任何一个

因为我们需要使用BaggingClassifier来拟合模型，所以我无法返回与DecisionTreeClassifier相关的结果（打印树（图）、特征重要性…）

这是我的剧本：

seed = 7
n_iterations = 199
DTC = DecisionTreeClassifier(random_state=seed,
                                                 max_depth=None,
                                                 min_impurity_split= 0.2,
                                                 min_samples_leaf=6,
                                                 max_features=None, #If None, then max_features=n_features.
                                                 max_leaf_nodes=20,
                                                 criterion='gini',
                                                 splitter='best',
                                                 )

#parametersDTC = {'max_depth':range(3,10), 'max_leaf_nodes':range(10, 30)}
parameters = {'max_features':range(1,200)}
dt = RandomizedSearchCV(BaggingClassifier(base_estimator=DTC,
                              #max_samples=1,
                              n_estimators=100,
                              #max_features=1,
                              bootstrap = False,
                              bootstrap_features = True, random_state=seed),
                        parameters, n_iter=n_iterations, n_jobs=14, cv=kfold,
                        error_score='raise', random_state=seed, refit=True) #min_samples_leaf=10

# Fit the model

fit_dt= dt.fit(X_train, Y_train)
print(dir(fit_dt))
tree_model = dt.best_estimator_

# Print the important features (NOT WORKING)

features = tree_model.feature_importances_
print(features)

rank = np.argsort(features)[::-1]
print(rank[:12])
print(sorted(list(zip(features))))

# Importing the image (NOT WORKING)
from sklearn.externals.six import StringIO

tree.export_graphviz(dt.best_estimator_, out_file='tree.dot') # necessary to plot the graph

dot_data = StringIO() # need to understand but it probably relates to read of strings
tree.export_graphviz(dt.best_estimator_, out_file=dot_data, filled=True, class_names= target_names, rounded=True, special_characters=True)
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())

img = Image(graph.create_png())
print(dir(img)) # with dir we can check what are the possibilities in graph.create_png

with open("my_tree.png", "wb") as png:
    png.write(img.data)

我得到的错误是：“BaggingClassifier”对象没有属性“tree”，而“BaggingClassifier”对象没有属性“feature\u importances”。有人知道我怎样才能得到它们吗？谢谢。

基于，BaggingClassifier对象确实没有“feature\u importances”属性。您仍然可以按照此问题的答案中所述自行计算：

您可以使用属性

估计器

访问BaggingClassifier拟合期间生成的树，如下例所示：

from sklearn import svm, datasets
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import BaggingClassifier


iris = datasets.load_iris()
clf = BaggingClassifier(n_estimators=3)
clf.fit(iris.data, iris.target)
clf.estimators_

clf.estimators\uu

是三个拟合决策树的列表：

[DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
             max_features=None, max_leaf_nodes=None,
             min_impurity_split=1e-07, min_samples_leaf=1,
             min_samples_split=2, min_weight_fraction_leaf=0.0,
             presort=False, random_state=1422640898, splitter='best'),
 DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
             max_features=None, max_leaf_nodes=None,
             min_impurity_split=1e-07, min_samples_leaf=1,
             min_samples_split=2, min_weight_fraction_leaf=0.0,
             presort=False, random_state=1968165419, splitter='best'),
 DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
             max_features=None, max_leaf_nodes=None,
             min_impurity_split=1e-07, min_samples_leaf=1,
             min_samples_split=2, min_weight_fraction_leaf=0.0,
             presort=False, random_state=2103976874, splitter='best')]

因此，您可以迭代列表并访问每一棵树。

可能重复@MikhailKorobov这不是链接中问题的重复。链接中的问题仅讨论功能重要性属性，而OP也对访问树本身感兴趣。感谢@Miriam Farber提供决策树列表，如果我想使用上面的脚本打印树（导入图像），我应该用列表中返回的参数分别运行每个决策树吗？@MauroNogueira是的。不过，您不需要复制参数，您只需在clf.estimators中为t执行“for”，然后在循环内运行之前用于单树的代码。循环中的每一个“t”都是一个拟合的决策树。我已经尝试过了，但是我在export_gaphviz中遇到了错误，例如'list'对象在dt.estimators中没有t的属性'tree'。export_graphviz（dt.estimators，out_file='tree.dot'）dot_data=StringIO（）读取字符串export_graphviz（dt.estimators\u，out\u file=dot\u data，filled=True，class\u names=target\u names，rounded=True，special\u characters=True）graph=pydotplus.graph\u from\u dot\u data（dot\u data.getvalue（））img=Image（graph.create\u png（））打印（dir（img）），打开（“HDAC8_tree.png”，“wb”）为png:png.write（img.data）@MauroNogueira我认为您需要用dt.best_estimator_u.estimators_u替换dt.estimators（在我的示例中，clf是BaggingClassifier对象。在您的代码中，除此之外，您还进行了网格搜索）。@MauroNogueira在注释中的代码中，在dt.estimators中的t行中：export_graphviz（dt.estimators_u，out_file='tree.dot'））你应该用t替换第二个dt.estimators（因为t是树，而dt.estimators是树的列表）。同样，在其他地方，一旦你有了t，你就需要直接使用它。