“如何修复”；变量“的类型无效（NULL）”；使用R中的party包时ctree函数出错？_R_Syntax_Decision Tree_Party

“如何修复”；变量“的类型无效（NULL）”；使用R中的party包时ctree函数出错？

r syntax

“如何修复”；变量“的类型无效（NULL）”；使用R中的party包时ctree函数出错？,r,syntax,decision-tree,party,R,Syntax,Decision Tree,Party,我在R的party包中使用了ctree（）。我希望能够从多个数据帧中创建列，对于这些数据帧，我将单独调用use column（使用$），就像我过去使用此函数一样，但这次它不起作用为了说明错误，我将一个示例数据集放在一起作为一个数据帧。当我跑步时： >ctree(data$adult_age~data$child_age+data$freq) 我得到以下错误： >Error in model.frame.default(formula = ~data$adult_age, data

我在R的party包中使用了

ctree（）

。我希望能够从多个数据帧中创建列，对于这些数据帧，我将单独调用use column（使用

），就像我过去使用此函数一样，但这次它不起作用

为了说明错误，我将一个示例数据集放在一起作为一个数据帧。当我跑步时：

>ctree(data$adult_age~data$child_age+data$freq)

我得到以下错误：

>Error in model.frame.default(formula = ~data$adult_age, data = list(),  : 
  invalid type (NULL) for variable 'data$adult_age'

如果我这样运行它，它会工作：

>ctree(adult_age~child_age+freq, data)

通常，这两种写出来的方法是可以互换的（例如，使用

lm（）

我得到的结果与这两种方法相同），但是使用

ctree（）

我遇到了一个错误。为什么？我如何解决这个问题，使我可以一次从不同的数据帧中提取数据，而不必组合它们

我的数据结构如下所示：

> dput(data)

>structure(list(adult_age = c(38, 38, 38, 38, 38, 55.5, 55.5, 38, 38, 38), child_age = c(8, 8, 13, 3.5, 3.5, 13, 8, 8, 8, 13), freq = c(0.1, 12, 0.1, 0.1, 0.1, 0.1, 1, 2, 0.1, 0.1)), .Names = c("adult_age", "child_age", "freq"), class = "data.frame", row.names = c(12L, 13L, 14L, 15L, 18L, 20L, 22L, 23L, 24L, 25L))

如果要运行示例数据：

>adult_age = c(38, 38, 38, 38, 38, 55.5, 55.5, 38, 38, 38)

>child_age = c(8, 8, 13, 3.5, 3.5, 13, 8, 8, 8, 13)

>freq = c(0.1, 12, 0.1, 0.1, 0.1, 0.1, 1, 2, 0.1, 0.1)

>data=as.data.frame(cbind(adult_age, child_age, freq))

为何不应采用这种方法切勿在模型公式中使用

data$

（正如@Roland已经指出的那样）。除了不必要地重复数据名并且必须键入更多之外，这也是混乱和错误的根源。如果您在使用

lm（）

时还没有遇到这个问题，那么您就没有使用

predict（）

。为你的<代码>数据< /代码>考虑一个简单的线性回归：

m1 <- lm(adult_age ~ child_age, data = data)
m2 <- lm(data$adult_age ~ data$child_age)
coef(m1) - coef(m2)
## (Intercept)   child_age 
##           0           0

但是对于

data$

版本，实际预测中根本不使用

newdata

：

predict(m2, newdata = data.frame(child_age = 0))
##        1        2        3        4        5        6        7        8 
## 41.14343 41.14343 44.11483 38.46917 38.46917 44.11483 41.14343 41.14343 
##        9       10 
## 41.14343 44.11483 
## Warning message:
## 'newdata' had 1 row but variables found have 10 rows

像这样的例子还有很多。但这一次应该足够严肃，避免这样做

如何将其应用于

ctree（）

如果您决定使用

data$

方法打击自己，您可以使用

partykit

包中的

ctree（）

的新（推荐）实现。使用标准的非标准评估重写了整个公式/数据处理

library("partykit")
ctree(adult_age ~ child_age + freq, data = data)
## Model formula:
## adult_age ~ child_age + freq
## 
## Fitted party:
## [1] root: 41.500 (n = 10, err = 490.0) 
## 
## Number of inner nodes:    0
## Number of terminal nodes: 1
ctree(data$adult_age ~ data$child_age + data$freq)
## Model formula:
## data$adult_age ~ data$child_age + data$freq
## 
## Fitted party:
## [1] root: 41.500 (n = 10, err = 490.0) 
## 
## Number of inner nodes:    0
## Number of terminal nodes: 1

别这样。依赖范围界定是一种糟糕的做法，可能会导致奇怪的错误。大多数包开发人员假设您使用

数据

参数。组合你的data.frames（应该很简单）。为什么这是“坏习惯”？多年来，我一直使用

和

lm

以及其他函数进行调用，没有出现任何问题。我在尝试使用列名和数据帧名之间有逗号的语法时遇到了问题。我非常喜欢“标准非标准计算”这个短语：）我希望你能：-）

library("partykit")
ctree(adult_age ~ child_age + freq, data = data)
## Model formula:
## adult_age ~ child_age + freq
## 
## Fitted party:
## [1] root: 41.500 (n = 10, err = 490.0) 
## 
## Number of inner nodes:    0
## Number of terminal nodes: 1
ctree(data$adult_age ~ data$child_age + data$freq)
## Model formula:
## data$adult_age ~ data$child_age + data$freq
## 
## Fitted party:
## [1] root: 41.500 (n = 10, err = 490.0) 
## 
## Number of inner nodes:    0
## Number of terminal nodes: 1