Machine learning python中随机森林决策规则的提取_Machine Learning_Scikit Learn_Deep Learning_Random Forest_Decision Tree

Machine learning python中随机森林决策规则的提取

machine-learning scikit-learn deep-learning

Machine learning python中随机森林决策规则的提取,machine-learning,scikit-learn,deep-learning,random-forest,decision-tree,Machine Learning,Scikit Learn,Deep Learning,Random Forest,Decision Tree,不过我有一个问题。我从某人那里听说，在R中，你可以使用额外的包来提取RF中实现的决策规则，我尝试用python搜索同样的东西，但没有运气，如果有任何关于如何实现这一点的帮助的话。提前谢谢假设您使用sklearn RandomForestClassifier，您可以通过.estimators.找到内部决策树。每个树都将决策节点存储为树下的多个NumPy数组下面是一些示例代码，它只是按照数组的顺序打印每个节点。在一个典型的应用程序中，人们会跟随子对象进行遍历 import numpy from

不过我有一个问题。我从某人那里听说，在R中，你可以使用额外的包来提取RF中实现的决策规则，我尝试用python搜索同样的东西，但没有运气，如果有任何关于如何实现这一点的帮助的话。

提前谢谢

假设您使用sklearn RandomForestClassifier，您可以通过

.estimators.

找到内部决策树。每个树都将决策节点存储为

树下的多个NumPy数组
下面是一些示例代码，它只是按照数组的顺序打印每个节点。在一个典型的应用程序中，人们会跟随子对象进行遍历
import numpy
from sklearn.model_selection import train_test_split
from sklearn import metrics, datasets, ensemble

def print_decision_rules(rf):

    for tree_idx, est in enumerate(rf.estimators_):
        tree = est.tree_
        assert tree.value.shape[1] == 1 # no support for multi-output

        print('TREE: {}'.format(tree_idx))

        iterator = enumerate(zip(tree.children_left, tree.children_right, tree.feature, tree.threshold, tree.value))
        for node_idx, data in iterator:
            left, right, feature, th, value = data

            # left: index of left child (if any)
            # right: index of right child (if any)
            # feature: index of the feature to check
            # th: the threshold to compare against
            # value: values associated with classes            

            # for classifier, value is 0 except the index of the class to return
            class_idx = numpy.argmax(value[0])

            if left == -1 and right == -1:
                print('{} LEAF: return class={}'.format(node_idx, class_idx))
            else:
                print('{} NODE: if feature[{}] < {} then next={} else next={}'.format(node_idx, feature, th, left, right))    


digits = datasets.load_digits()
Xtrain, Xtest, ytrain, ytest = train_test_split(digits.data, digits.target)
estimator = ensemble.RandomForestClassifier(n_estimators=3, max_depth=2)
estimator.fit(Xtrain, ytrain)

print_decision_rules(estimator)

导入numpy
从sklearn.model\u选择导入列车\u测试\u拆分
从sklearn导入度量、数据集和集成
def打印决策规则（rf）：
对于tree_idx，枚举中的est（rf.估计器）：
tree=est.tree_
assert tree.value.shape[1]==1#不支持多输出
打印（'TREE:{}'。格式（TREE_idx））
迭代器=枚举（zip（tree.children\u left，tree.children\u right，tree.feature，tree.threshold，tree.value））
对于节点_idx，迭代器中的数据：
左、右、特征、th、值=数据
#左：左子项的索引（如果有）
#右侧：右侧子项的索引（如果有）
#功能：要检查的功能的索引
#th：要比较的阈值
#值：与类关联的值
#对于分类器，值为0，但要返回的类的索引除外
class_idx=numpy.argmax（值[0]）
如果左==-1和右==-1：
打印（“{}LEAF:return class={}”。格式（node_idx，class_idx））
其他：
打印（“{}节点：如果特征[{}]<{}，则下一步={}否则下一步={}”。格式（节点_idx，特征，th，左，右））
数字=数据集。加载数字（）
Xtrain，Xtest，ytrain，ytest=列车测试分割（digits.data，digits.target）
估计器=集合。随机森林分类器（n_估计器=3，最大深度=2）
估计值拟合（Xtrain，ytrain）
打印决策规则（估计器）

示例输出：
TREE: 0
0 NODE: if feature[33] < 2.5 then next=1 else next=4
1 NODE: if feature[38] < 0.5 then next=2 else next=3
2 LEAF: return class=2
3 LEAF: return class=9
4 NODE: if feature[50] < 8.5 then next=5 else next=6
5 LEAF: return class=4
6 LEAF: return class=0
...

树：0
0节点：如果特征[33]<2.5，则next=1，否则next=4
1节点：如果特征[38]<0.5，则next=2，否则next=3
2叶：返回类=2
3叶：返回类=9
4节点：如果特征[50]<8.5，则下一步=5，否则下一步=6
5叶：返回类=4
6叶：返回类=0
...

我们使用emtrees中类似的东西将随机林编译成C代码