R 随机森林出袋变量重要性
我试图理解%var是如何超过100的 我正在使用脚本:R 随机森林出袋变量重要性,r,random-forest,variance,R,Random Forest,Variance,我试图理解%var是如何超过100的 我正在使用脚本: require(randomForest) start <- "B_fixed" suffix <- ".txt" dataDir <- "/Users/Desktop/" mod1 <- read.table(paste(dataDir,start,suffix,sep=""),sep="\t",header=T) form <- as.formula(Ksat_f~.) Ksat_rf <
require(randomForest)
start <- "B_fixed"
suffix <- ".txt"
dataDir <- "/Users/Desktop/"
mod1 <- read.table(paste(dataDir,start,suffix,sep=""),sep="\t",header=T)
form <- as.formula(Ksat_f~.)
Ksat_rf <- randomForest(form, data=mod1[c(1:14)],na.action=na.omit, ntree=1000,
replace=F,importance=T, do.trace=50, keep.forest=T,keep.inbag=T)
这是使用14个变量。。。。。如果我使用一个变量,%var可以得到145%
什么都可以
谢谢
-t145%的数字告诉你,你的模型是错的远远多于对的 我承认这有点令人困惑。
%Var(y)
指的是误差相对于总目标方差的百分比方差。而%Var解释:
指模型解释的百分比方差
注意:105.15%+(-5.15%)=100%
在下面的可复制示例中,我洗牌/排列目标(y),因此RF模型没有机会预测。您会发现它的性能非常差,因为误差超过100%,并且解释的方差小于0%。在解释方差为0%时,您的模型与预测任何等于总平均值的观测值具有相同的准确性
set.seed(1)
library(randomForest)
X <- data.frame(replicate(5,rnorm(1000)))
y <- apply(X,1,sum)
y <- sample(y)
Data <- data.frame(X,y)
form <- as.formula(y~.)
rf <- randomForest(form, data=Data,na.action=na.omit,
ntree=1000,replace=F,importance=T,
do.trace=50, keep.forest=T,keep.inbag=T)
| Out-of-bag |
Tree | MSE %Var(y) |
50 | 5.81 108.91 |
100 | 5.671 106.31 |
150 | 5.651 105.95 |
1000 | 5.609 105.15 |
print(rf)
Call:
randomForest(formula = form, data = Data, ntree = 1000, replace = F, importance = T, do.trace = 50, keep.forest = T, keep.inbag = T, na.action = na.omit)
Type of random forest: regression
Number of trees: 1000
No. of variables tried at each split: 1
Mean of squared residuals: 5.608769
% Var explained: -5.15
set.seed(1)
图书馆(森林)
X请编辑您的帖子并使其成为一个完整的可复制示例(从加载所需包的库调用开始)。您的y-var是什么样子的?@triBaker您的编辑很接近,但不完全可复制。您不能参考数据集,只有您有:)不客气:)顺便说一句,这是解释了袋外样品的差异,而不是可变的重要性。变量重要性(RF回归)是指在训练后、预测前由于给定变量的排列而导致的带外解释方差的减少。
set.seed(1)
library(randomForest)
X <- data.frame(replicate(5,rnorm(1000)))
y <- apply(X,1,sum)
y <- sample(y)
Data <- data.frame(X,y)
form <- as.formula(y~.)
rf <- randomForest(form, data=Data,na.action=na.omit,
ntree=1000,replace=F,importance=T,
do.trace=50, keep.forest=T,keep.inbag=T)
| Out-of-bag |
Tree | MSE %Var(y) |
50 | 5.81 108.91 |
100 | 5.671 106.31 |
150 | 5.651 105.95 |
1000 | 5.609 105.15 |
print(rf)
Call:
randomForest(formula = form, data = Data, ntree = 1000, replace = F, importance = T, do.trace = 50, keep.forest = T, keep.inbag = T, na.action = na.omit)
Type of random forest: regression
Number of trees: 1000
No. of variables tried at each split: 1
Mean of squared residuals: 5.608769
% Var explained: -5.15