Python 从梯度boosting分类器中提取决策规则_Python_Scikit Learn_Sas_Graphviz

Python 从梯度boosting分类器中提取决策规则

python scikit-learn sas graphviz

Python 从梯度boosting分类器中提取决策规则,python,scikit-learn,sas,graphviz,Python,Scikit Learn,Sas,Graphviz,我已经回答了以下问题：然而，上述两项并不能解决我的目的。以下是我的疑问：我需要使用GradientBoostingClassifier在Python中构建一个模型，并在SAS平台上实现这个模型。为此，我需要从GradientBoostingClassifier中提取决策规则以下是我迄今为止所做的尝试：根据IRIS数据建立模型： # import the most common dataset from sklearn.datasets import load_iris from sk

我已经回答了以下问题：

然而，上述两项并不能解决我的目的。以下是我的疑问：

我需要使用GradientBoostingClassifier在Python中构建一个模型，并在SAS平台上实现这个模型。为此，我需要从GradientBoostingClassifier中提取决策规则

以下是我迄今为止所做的尝试：

根据IRIS数据建立模型：

# import the most common dataset
from sklearn.datasets import load_iris
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.tree import export_graphviz
from sklearn.externals.six import StringIO  
from IPython.display import Image

X, y = load_iris(return_X_y=True)
# there are 150 observations and 4 features
print(X.shape) # (150, 4)
# let's build a small model = 5 trees with depth no more than 2
model = GradientBoostingClassifier(n_estimators=5, max_depth=3, learning_rate=1.0)
model.fit(X, y==2) # predict 2nd class vs rest, for simplicity
# we can access individual trees
trees = model.estimators_.ravel()

def plot_tree(clf):
    dot_data = StringIO()
    export_graphviz(clf, out_file=dot_data, node_ids=True,
                    filled=True, rounded=True, 
                    special_characters=True)
    graph = pydotplus.graph_from_dot_data([enter image description here][3]dot_data.getvalue())  
    return Image(graph.create_png())

# now we can plot the first tree
plot_tree(trees[0])

绘制完图形后，我检查了第一棵树的图形源代码，并使用以下代码写入文本文件：

with open("C:\\Users\XXXX\Desktop\Python\input_tree.txt", "w") as wrt:
    wrt.write(export_graphviz(trees[0], out_file=None, node_ids=True,
                filled=True, rounded=True, 
                special_characters=True))

以下是输出文件：

digraph Tree {
node [shape=box, style="filled, rounded", color="black", fontname=helvetica] ;
edge [fontname=helvetica] ;
0 [label=<node &#35;0<br/>X<SUB>3</SUB> &le; 1.75<br/>friedman_mse = 0.222<br/>samples = 150<br/>value = 0.0>, fillcolor="#e5813955"] ;
1 [label=<node &#35;1<br/>X<SUB>2</SUB> &le; 4.95<br/>friedman_mse = 0.046<br/>samples = 104<br/>value = -0.285>, fillcolor="#e5813945"] ;
0 -> 1 [labeldistance=2.5, labelangle=45, headlabel="True"] ;
2 [label=<node &#35;2<br/>X<SUB>3</SUB> &le; 1.65<br/>friedman_mse = 0.01<br/>samples = 98<br/>value = -0.323>, fillcolor="#e5813943"] ;
1 -> 2 ;
3 [label=<node &#35;3<br/>friedman_mse = 0.0<br/>samples = 97<br/>value = -1.5>, fillcolor="#e5813900"] ;
2 -> 3 ;
4 [label=<node &#35;4<br/>friedman_mse = -0.0<br/>samples = 1<br/>value = 3.0>, fillcolor="#e58139ff"] ;
2 -> 4 ;
5 [label=<node &#35;5<br/>X<SUB>3</SUB> &le; 1.55<br/>friedman_mse = 0.222<br/>samples = 6<br/>value = 0.333>, fillcolor="#e5813968"] ;
1 -> 5 ;
6 [label=<node &#35;6<br/>friedman_mse = 0.0<br/>samples = 3<br/>value = 3.0>, fillcolor="#e58139ff"] ;
5 -> 6 ;
7 [label=<node &#35;7<br/>friedman_mse = 0.222<br/>samples = 3<br/>value = 0.0>, fillcolor="#e5813955"] ;
5 -> 7 ;
8 [label=<node &#35;8<br/>X<SUB>2</SUB> &le; 4.85<br/>friedman_mse = 0.021<br/>samples = 46<br/>value = 0.645>, fillcolor="#e581397a"] ;
0 -> 8 [labeldistance=2.5, labelangle=-45, headlabel="False"] ;
9 [label=<node &#35;9<br/>X<SUB>1</SUB> &le; 3.1<br/>friedman_mse = 0.222<br/>samples = 3<br/>value = 0.333>, fillcolor="#e5813968"] ;
8 -> 9 ;
10 [label=<node &#35;10<br/>friedman_mse = 0.0<br/>samples = 2<br/>value = 3.0>, fillcolor="#e58139ff"] ;
9 -> 10 ;
11 [label=<node &#35;11<br/>friedman_mse = -0.0<br/>samples = 1<br/>value = -1.5>, fillcolor="#e5813900"] ;
9 -> 11 ;
12 [label=<node &#35;12<br/>friedman_mse = -0.0<br/>samples = 43<br/>value = 3.0>, fillcolor="#e58139ff"] ;
8 -> 12 ;
}

如您所见，输出文件中缺少一个部分，即我无法正确打开/关闭do端块。为此，我需要使用节点号，但我没有这样做，因为我无法在这里找到任何模式

你们谁能帮我回答这个问题

除此之外，像decisiontreeclassifier一样，我不能提取上面第二个链接中提到的左、右、阈值。我已经成功地提取了GBM的每一棵树

trees = model.estimators_.ravel()

但是我没有找到任何有用的函数，可以用来提取每棵树的值和规则。如果我能以类似DecisionTreeclassifier的方式使用grapviz对象，请提供帮助

或

请帮助我使用任何其他方法来解决我的问题。

无需使用graphviz导出来访问决策树数据<代码>模型。估计器包含模型包含的所有单个分类器。在GradientBoostingClassifier的情况下，这是一个具有形状的2D numpy数组（n_估计量，n_类），每个项都是一个DecisionTreeRegressor

每个决策树都有一个属性

\u tree

，并显示如何从该对象中提取节点、阈值和子对象


进口numpy
进口大熊猫
从sklearn.employ导入GradientBoostingClassifier
est=梯度增强分类器（n_估计值=4）
随机种子（1）
est.fit（numpy.random.random（（100,3）），numpy.random.choice（[0,1,2]，size=（100，））
打印（'s'，估计值\形状）
n_类，n_估计量=估计量
对于范围内的c（n_类）：
对于范围内的t（n_估计）：
dtree=估计量[c，t]
打印（“class={}，tree={}:{}”。格式（c，t，dtree.tree）
规则=1.DataFrame({
'child_left'：dtree.tree_.children_left，
'child_right'：dtree.tree_.children_right，
“功能”：dtree.tree_uu2;.feature，
“阈值”：dtree.tree\uu0.threshold，
})
打印（规则）

为每个树输出如下内容：

class=0, tree=0: <sklearn.tree._tree.Tree object at 0x7f18a697f370>
   child_left  child_right  feature  threshold
0           1            2        0   0.020702
1          -1           -1       -2  -2.000000
2           3            6        1   0.879058
3           4            5        1   0.543716
4          -1           -1       -2  -2.000000
5          -1           -1       -2  -2.000000
6           7            8        0   0.292586
7          -1           -1       -2  -2.000000
8          -1           -1       -2  -2.000000

class=0，tree=0：
子左子右特征阈值
0           1            2        0   0.020702
1          -1           -1       -2  -2.000000
2           3            6        1   0.879058
3           4            5        1   0.543716
4          -1           -1       -2  -2.000000
5          -1           -1       -2  -2.000000
6           7            8        0   0.292586
7          -1           -1       -2  -2.000000
8          -1           -1       -2  -2.000000

您可以将模型导出到PMML类型文件中吗？显然，我无法导出到PMML类型文件，因为sklearn2pmml包不在我们公司的服务器上。但是，我已请求在我的系统上安装此软件包。所以我将来可能能够导出到PMML文件。您能告诉我如何从PMML文件中提取决策规则吗？您使用的是SAS EM还是Model Manager？他们可以直接导入PMML。不幸的是，我只有SAS EG访问权限，没有SAS EM或模型管理器。有没有办法在SAS EGHi Jonnor中导入PMML文件，谢谢回答。但是，我没有在

模型中找到\u tree
属性。估计器可以用来从该对象获取节点、阈值等。我尝试了代码trees=model.estimators.\utrees（）
和trees=model.estimators..trees（）
但是它给出了一个错误AttributeError:'numpy.ndarray'对象在这两种情况下都没有属性“trees”。你能帮我测试一下代码吗。非常感谢你的帮助help@Ved，我已经附上了示例代码并改进了答案。
trees = model.estimators_.ravel()

class=0, tree=0: <sklearn.tree._tree.Tree object at 0x7f18a697f370>
   child_left  child_right  feature  threshold
0           1            2        0   0.020702
1          -1           -1       -2  -2.000000
2           3            6        1   0.879058
3           4            5        1   0.543716
4          -1           -1       -2  -2.000000
5          -1           -1       -2  -2.000000
6           7            8        0   0.292586
7          -1           -1       -2  -2.000000
8          -1           -1       -2  -2.000000