R 估算后得到预测_R_Random Forest

R 估算后得到预测

R 估算后得到预测,r,random-forest,R,Random Forest,我正在用软件包做一些建模。rfImpute函数非常适合在拟合模型时处理缺失值。然而，有没有一种方法可以预测缺失值的新案例以下内容基于？rfImpute中的示例 iris.na <- iris set.seed(111) ## artificially drop some data values. for (i in 1:4) iris.na[sample(150, sample(20)), i] <- NA ## impute the dropped values set.se

我正在用软件包做一些建模。

rfImpute

函数非常适合在拟合模型时处理缺失值。然而，有没有一种方法可以预测缺失值的新案例

以下内容基于

？rfImpute

中的示例

iris.na <- iris

set.seed(111)
## artificially drop some data values.
for (i in 1:4) iris.na[sample(150, sample(20)), i] <- NA

## impute the dropped values
set.seed(222)
iris.imputed <- rfImpute(Species ~ ., iris.na)

## fit the model
set.seed(333)
iris.rf <- randomForest(Species ~ ., iris.imputed)

# now try to predict for a case where a variable is missing
> predict(iris.rf, iris.na[148, , drop=FALSE])
[1] <NA>
Levels: setosa versicolor virginica

iris.na这可能不是你想要的干净的解决方案，但这里有一个前进的方向。问题有两方面：
1） NA变量的值需要根据创建原始数据的相同插补方案进行插补
2） 结果需要根据该估算值进行预测，但要根据原始随机森林，而不需要新数据
1:
将新的观察结果附加到插补（而不是原始）数据集（即利用您已经获得的插补数据），并插补新的缺失值。新值与根据原始观察值估算的值不匹配（不应该匹配）
iris.na2=rbind（iris.imputed，iris.na[148，drop=FALSE]）
iris.imputed2=rfImpute（物种~，iris.na2）
>>>尾（虹膜。插补，3）
种萼片。长萼片。宽花瓣。长花瓣。宽
148弗吉尼亚州6.5 3.019279 5.2.0
149弗吉尼亚州6.2 3.400000 5.4 2.3
150弗吉尼亚州5.930000005.1118
>>>尾（虹膜。输入2,4）
种萼片。长萼片。宽花瓣。长花瓣。宽
148弗吉尼亚州6.5 3.019279 5.2.0
149弗吉尼亚州6.2 3.400000 5.4 2.3
150弗吉尼亚州5.930000005.1118
1481弗吉尼亚州6.5 3.023392 5.2 2.0
2:
利用原始随机森林的信息预测新输入的观测值
predict(iris.rf, iris.imputed2[151, ])
     1481 
virginica 
Levels: setosa versicolor virginica
预测（iris.rf，iris.imputed2[151，]）
1481
弗吉尼亚州
等级：维吉尼亚花色刚毛
方差会有问题，因为在使用插补数据插补另一个数据点时，不包括隐含的不确定性。解决这个问题的一种方法是引导
如果因变量缺失，该方法也有效（predict不关心因变量，因此您也可以给出一个自变量矩阵）：
>>>missY=cbind（NA，iris.inputed2[151，2:5]）
>>>小姐
萼片。长萼片。宽花瓣。长花瓣。宽
1481 NA 6.5 3.023392 5.2 2
>>>预测（iris.rf，missY）
1481
弗吉尼亚州
等级：维吉尼亚花色刚毛
四年后，一家公司
Microsoft R Server/Client附带的rxDForest
函数可以获取缺失值情况下的预测值。这是因为rxDForest
使用与rxDTree
相同的底层代码来拟合单个决策树，因此得益于后者创建代理变量的能力
iris.na <- iris

set.seed(111)
## artificially drop some data values.
for (i in 1:4) iris.na[sample(150, sample(20)), i] <- NA


library(RevoScaleR)

# rxDForest doesn't support dot-notation for formulas
iris.rxf <- rxDForest(Species ~ Petal.Length + Petal.Width + Sepal.Length + Sepal.Width,
    data=iris.na, nTree=100)

pred <- rxPredict(iris.rxf, iris.na)  # not predict()

table(pred)
#    setosa versicolor  virginica 
#        50         48         52 

iris.na很高兴你找到了更好的方法！谢谢你的回复。
predict(iris.rf, iris.imputed2[151, ])
     1481 
virginica 
Levels: setosa versicolor virginica
>>>missY = cbind(NA,iris.imputed2[151, 2:5])
>>>missY
     NA Sepal.Length Sepal.Width Petal.Length Petal.Width
1481 NA          6.5    3.023392          5.2           2

>>>predict(iris.rf,missY)
     1481 
virginica 
Levels: setosa versicolor virginica
iris.na <- iris

set.seed(111)
## artificially drop some data values.
for (i in 1:4) iris.na[sample(150, sample(20)), i] <- NA


library(RevoScaleR)

# rxDForest doesn't support dot-notation for formulas
iris.rxf <- rxDForest(Species ~ Petal.Length + Petal.Width + Sepal.Length + Sepal.Width,
    data=iris.na, nTree=100)

pred <- rxPredict(iris.rxf, iris.na)  # not predict()

table(pred)
#    setosa versicolor  virginica 
#        50         48         52