如何将一个包含多个变量的列联表可视化地表示为R中的决策树?

如何将一个包含多个变量的列联表可视化地表示为R中的决策树?,r,graphics,machine-learning,statistics,survey,R,Graphics,Machine Learning,Statistics,Survey,例如,假设我有一位受访者,并询问他/她是否患有疾病。从那里我问她/他的父亲是否得了这种病。如果后一个问题是肯定的,那么我会问这位父亲现在是否痊愈了。如果父亲没有患病,那么这个问题就不适用了 我可以在R或其他地方创建这样的“决策树”吗 以下是可用数据,其中0表示“否”,1表示“是”: person\u disease确实有一个软件包可以用来创建这种类型的图形,非常方便地称之为diagram 它不是一个像barplot()或qplot()那样的自动绘图过程,但它是一种可以用来制作您想要制作的图形的东

例如,假设我有一位受访者,并询问他/她是否患有疾病。从那里我问她/他的父亲是否得了这种病。如果后一个问题是肯定的,那么我会问这位父亲现在是否痊愈了。如果父亲没有患病,那么这个问题就不适用了

我可以在R或其他地方创建这样的“决策树”吗

以下是可用数据,其中0表示“否”,1表示“是”:


person\u disease确实有一个软件包可以用来创建这种类型的图形,非常方便地称之为diagram

它不是一个像
barplot()
qplot()
那样的自动绘图过程,但它是一种可以用来制作您想要制作的图形的东西

如果你是有纪律的,你可以编写代码,使过程更加自动化,适合你的特定数据和情况

包名为diagram。你可以在这个pdf中找到更多关于它的信息


您可以使用data.tree包来实现这一点。有很多方法可以做你想做的事。例如:

person_disease <- c(rep(1, 10), rep(0, 20))
father_disease <- c(rep(1, 7), rep(0,18), rep(1,5))
father_cured <- c( rep(0, 4), rep(1,3), rep(NA,18),rep(1,5)  )
df <- data.frame(person_disease, father_disease, father_cured)

library(data.tree)

#here, the tree is constructed "manually"
#however, depending on your data and your needs, you might want to generate the tree directly from the data
#many examples for this are available in the vignettes, see browseVignettes("data.tree")
disease <- Node$new("Disease", data = df)
father_disease_yes <- disease$AddChild("Father Disease Yes", label = "Father Disease", edge = "yes", condition = function(df) df[df$person_disease == 1,])
father_cured_yes <- father_disease_yes$AddChild("Father Cured Yes", label = "Father Cured", edge = "yes", condition = function(df) df[df$father_cured == 1,])
father_disease_no <- disease$AddChild("Father Disease No", label = "Father Disease", edge = "no", condition = function(df) df[df$person_disease == 0,])


#data filter (pre-order)
#an alternative would be to do this recursively
disease$Do(function(node) {
  for (child in node$children) {
    child$data <- child$condition(node$data)
  }
})

print(disease, total = function(node) nrow(node$data))


#plotting
#(many more options are available, see ?plot.Node)
SetEdgeStyle(disease,
             fontname = "helvetica",
             arrowhead = "none",
             label = function(node) paste0(node$edge, "\n", "total = ", nrow(node$data)))

SetNodeStyle(disease,
             fontname = "helvetica",
             label = function(node) node$label)

plot(disease)
person\u disease也许包partykit()会有所帮助。
person_disease <- c(rep(1, 10), rep(0, 20))
father_disease <- c(rep(1, 7), rep(0,18), rep(1,5))
father_cured <- c( rep(0, 4), rep(1,3), rep(NA,18),rep(1,5)  )
df <- data.frame(person_disease, father_disease, father_cured)

library(data.tree)

#here, the tree is constructed "manually"
#however, depending on your data and your needs, you might want to generate the tree directly from the data
#many examples for this are available in the vignettes, see browseVignettes("data.tree")
disease <- Node$new("Disease", data = df)
father_disease_yes <- disease$AddChild("Father Disease Yes", label = "Father Disease", edge = "yes", condition = function(df) df[df$person_disease == 1,])
father_cured_yes <- father_disease_yes$AddChild("Father Cured Yes", label = "Father Cured", edge = "yes", condition = function(df) df[df$father_cured == 1,])
father_disease_no <- disease$AddChild("Father Disease No", label = "Father Disease", edge = "no", condition = function(df) df[df$person_disease == 0,])


#data filter (pre-order)
#an alternative would be to do this recursively
disease$Do(function(node) {
  for (child in node$children) {
    child$data <- child$condition(node$data)
  }
})

print(disease, total = function(node) nrow(node$data))


#plotting
#(many more options are available, see ?plot.Node)
SetEdgeStyle(disease,
             fontname = "helvetica",
             arrowhead = "none",
             label = function(node) paste0(node$edge, "\n", "total = ", nrow(node$data)))

SetNodeStyle(disease,
             fontname = "helvetica",
             label = function(node) node$label)

plot(disease)