I';我试图将二叉树(party)的分支列表到R中的数据帧中

I';我试图将二叉树(party)的分支列表到R中的数据帧中,r,decision-tree,party,R,Decision Tree,Party,在用party::ctree()拟合树之后,我想创建一个表来描述分支的特征 我已经拟合了这些变量 > summary(juridicos_segmentar) actividad_economica Financieras : 89 Gubernamental : 48 Sector Primario : 34 Sector Secundario:596 Sector Terciario :669

在用party::ctree()拟合树之后,我想创建一个表来描述分支的特征

我已经拟合了这些变量

> summary(juridicos_segmentar)
        actividad_economica
 Financieras      : 89     
 Gubernamental    : 48     
 Sector Primario  : 34     
 Sector Secundario:596     
 Sector Terciario :669     
              ingresos_cut
 (-Inf,1.03e+08]    :931  
 (1.03e+08,4.19e+08]:252  
 (4.19e+08,1.61e+09]:144  
 (1.61e+09, Inf]    :109  

              egresos_cut 
 (-Inf,6e+07]       :922  
 (6e+07,2.67e+08]   :256  
 (2.67e+08,1.03e+09]:132  
 (1.03e+09, Inf]    :126  

             patrimonio_cut
 (-Inf,2.72e+08]    :718   
 (2.72e+08,1.46e+09]:359   
 (1.46e+09,5.83e+09]:191   
 (5.83e+09, Inf]    :168   

   op_ingreso_cut
 (-Inf,3] :1308  
 (3,7]    :  53  
 (7,22]   :  44  
 (22, Inf]:  31
第一个是分类的,其他的是顺序的,我把它们放在一起 另一个因素变量

> summary(as.factor(segmento))
  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20 
 27  66  30  39  36  33  39  15  84  70 271 247 101  34 100  74  47  25  48  50
我使用了以下代码

library(party)
fit_jur <- ctree(cluster ~ ., 
             data=data.frame(juridicos_segmentar, cluster=as.factor(segmento)))
问题是有几个叶子需要表征,有时一个变量会在一条路径中出现多次,因此我希望与条件相交,即与范围相交

我想到了
data.tree::ToDataFrameTable
,但我不知道它如何与
方一起工作

非常感谢大家


库(partykit)

fit_jur您可以将party类(来自partykit)和BinaryTree(来自party)转换为data.tree,并使用它转换为数据帧和/或打印。例如:

actividad economica      ingresos (rango)   egresos (rango) patrimonio (rango) operaciones de ingreso   segmento
Sector Primario                             <=261.000.000                                                 18
Sector Primario                             >261.000.000                                                  20
library(party)
airq <- subset(airquality, !is.na(Ozone))
airct <- ctree(Ozone ~ ., data = airq,
               controls = ctree_control(maxsurrogate = 3))
tree <- as.Node(airct)
df <- ToDataFrameTable(tree,
      "pathString",
      "label",
      criterion = function(x) round(x$criterion$maxcriterion, 3),
      statistic = function(x) round(max(x$criterion$statistic), 3)
)
df
绘图:

#print subtree
subtree <- Clone(tree$`2`)
SetNodeStyle(subtree, 
             style = "filled,rounded", 
             shape = "box", 
             fillcolor = "GreenYellow", 
             fontname = "helvetica", 
             label = function(x) x$label,
             tooltip = function(x) round(x$criterion$maxcriterion, 3))
plot(subtree)
#打印子树

子树这可能类似于你想做的:谢谢你,阿希姆。我正在研究这个解决方案。谢谢你,克里斯托夫。但在这种情况下,我需要路径的规则。请注意,权重不是属于要拟合的变量。我需要点什么对不起。类似这样的。我需要你最后拟合臭氧预测的每个变量的决策规则。风|温度|月|日|太阳。R |臭氧>z | w |预测
library(party)
airq <- subset(airquality, !is.na(Ozone))
airct <- ctree(Ozone ~ ., data = airq,
               controls = ctree_control(maxsurrogate = 3))
tree <- as.Node(airct)
df <- ToDataFrameTable(tree,
      "pathString",
      "label",
      criterion = function(x) round(x$criterion$maxcriterion, 3),
      statistic = function(x) round(max(x$criterion$statistic), 3)
)
df
  pathString        label criterion statistic
1      1/2/3 weights = 10     0.000     0.000
2    1/2/4/5 weights = 48     0.936     6.141
3    1/2/4/6 weights = 21     0.891     5.182
4      1/7/8 weights = 30     0.675     3.159
5      1/7/9  weights = 7     0.000     0.000
#print subtree
subtree <- Clone(tree$`2`)
SetNodeStyle(subtree, 
             style = "filled,rounded", 
             shape = "box", 
             fillcolor = "GreenYellow", 
             fontname = "helvetica", 
             label = function(x) x$label,
             tooltip = function(x) round(x$criterion$maxcriterion, 3))
plot(subtree)