如何推导叶节点的标准偏差（rpart）？_R_Decision Tree_Rpart

如何推导叶节点的标准偏差（rpart）？

如何推导叶节点的标准偏差（rpart）？,r,decision-tree,rpart,R,Decision Tree,Rpart,我用rpart做了一个回归树，根据几个变量来评估老年人的步行情况。随着绘图的使用，我想在另一个软件中使用输出进行进一步分析。但是，我想知道是否不仅可以从叶节点推导出每组的行走，还可以从叶节点推导出标准偏差（就行走而言）带有rpart的决策树 modelRT我不认为你可以从绘图中得出，但是你当然可以从rpart模型中得出每个叶节点的标准偏差。由于您没有提供数据，因此我将使用内置的iris数据做一个示例。由于您对回归感兴趣，我将消除类变量（物种），并根据其他变量预测变量Sepal.Length

我用rpart做了一个回归树，根据几个变量来评估老年人的步行情况。随着绘图的使用，我想在另一个软件中使用输出进行进一步分析。但是，我想知道是否不仅可以从叶节点推导出每组的行走，还可以从叶节点推导出标准偏差（就行走而言）

带有rpart的决策树

modelRT我不认为你可以从绘图中得出，但是你当然可以从rpart模型中得出每个叶节点的标准偏差。由于您没有提供数据，因此我将使用内置的iris数据做一个示例。由于您对回归感兴趣，我将消除类变量（物种），并根据其他变量预测变量Sepal.Length

设置

library(rpart)
library(rpart.plot)

RP = rpart(Sepal.Length ~ ., data=iris[,-5])
rpart.plot(as.party(RP))

如您所见，节点4、5、6、10、11、12和13是叶节点。返回结构的一部分

RP$，其中告诉您原始实例去了哪个叶。所以您只需要使用这个变量进行聚合
SD = aggregate(iris$Sepal.Length, list(RP$where), sd)
SD
  Group.1         x
1       4 0.2390221
2       5 0.2888391
3       6 0.2500526
4      10 0.4039577
5      11 0.3802046
6      12 0.3020486
7      13 0.2279132

Group.1告诉您哪个叶节点，x告诉您在该叶中结束的点的标准偏差。如果希望将标准偏差添加到绘图中，可以使用mtext
。在摆弄了一些布局之后：
rpart.plot(RP)
mtext(text=round(SD$x,1), side=1, line=3.2, at=seq(0.06,1,0.1505))

要在树的每个节点上绘制标准偏差，可以使用rpart.plot
一个节点.fun，如
.
比如说
library(rpart.plot)
data(iris)
tree = rpart(Sepal.Length~., data=iris, cp=.05) # example tree

# Calculate the standard deviation at each node of the tree.
sd <- sqrt(tree$frame$dev / (tree$frame$n-1))

# Append the standard deviation as an extra column to the tree frame.
tree$frame <- cbind(tree$frame, sd)

# Create a node.fun to print the standard deviation at each node.
# See Chapter 6 of the rpart.plot vignette http://www.milbo.org/doc/prp.pdf.
node.fun.sd <- function(x, labs, digits, varlen)
{
    s <- round(x$frame$sd, 2) # round sd to 2 digits
    paste(labs, "\n\nsd", s)
 }

# Plot the tree, using the node.fun to add the standard deviation to each node
rpart.plot(tree, type=4, node.fun=node.fun.sd)

库（rpart.plot）
数据（iris）
tree=rpart（萼片长度~，data=iris，cp=0.05）#示例树
#计算树的每个节点处的标准偏差。
sd
library(rpart.plot)
data(iris)
tree = rpart(Sepal.Length~., data=iris, cp=.05) # example tree

# Calculate the standard deviation at each node of the tree.
sd <- sqrt(tree$frame$dev / (tree$frame$n-1))

# Append the standard deviation as an extra column to the tree frame.
tree$frame <- cbind(tree$frame, sd)

# Create a node.fun to print the standard deviation at each node.
# See Chapter 6 of the rpart.plot vignette http://www.milbo.org/doc/prp.pdf.
node.fun.sd <- function(x, labs, digits, varlen)
{
    s <- round(x$frame$sd, 2) # round sd to 2 digits
    paste(labs, "\n\nsd", s)
 }

# Plot the tree, using the node.fun to add the standard deviation to each node
rpart.plot(tree, type=4, node.fun=node.fun.sd)

library(rpart.plot)
data(iris)
tree = rpart(Sepal.Length~., data=iris, cp=.05)
sd <- sqrt(tree$frame$dev / (tree$frame$n-1))
is.leaf <- tree$frame$var == "<leaf>" # logical vec, indexed on row in frame
sd[!is.leaf] <- NA # change sd of non-leaf nodes to NA
tree$frame <- cbind(tree$frame, sd)
node.fun2 <- function(x, labs, digits, varlen)
{
    s <- paste("\n\nsd", round(x$frame$sd, 2)) # round sd to 2 digits
    s[is.na(x$frame$sd)] <- "" # delete NAs
    paste(labs, s)
}
rpart.plot(tree, type=4, node.fun=node.fun2)