如何在r中计算GBM精度

如何在r中计算GBM精度,r,gbm,boosting,mse,R,Gbm,Boosting,Mse,我使用gbm()函数来创建模型,我希望得到准确度。这是我的密码: df<-read.csv("http://freakonometrics.free.fr/german_credit.csv", header=TRUE) str(df) F=c(1,2,4,5,7,8,9,10,11,12,13,15,16,17,18,19,20,21) for(i in F) df[,i]=as.factor(df[,i]) library(caret) set.seed(1000) intra

我使用gbm()函数来创建模型,我希望得到准确度。这是我的密码:

df<-read.csv("http://freakonometrics.free.fr/german_credit.csv", header=TRUE)

str(df)

F=c(1,2,4,5,7,8,9,10,11,12,13,15,16,17,18,19,20,21)
for(i in F) df[,i]=as.factor(df[,i])

library(caret)

set.seed(1000)
intrain<-createDataPartition(y=df$Creditability, p=0.7, list=FALSE)
train<-df[intrain, ]
test<-df[-intrain, ]

install.packages("gbm")
library("gbm")

df_boosting<-gbm(Creditability~.,distribution = "bernoulli", n.trees=100, verbose=TRUE, interaction.depth=4,
                 shrinkage=0.01, data=train)
summary(df_boosting)

yhat.boost<-predict (df_boosting ,newdata =test, n.trees=100)
mean((yhat.boost-test$Creditability)^2) 
并且,当使用平均值函数测量MSE时,也会出现以下误差:

Warning message:
In Ops.factor(yhat.boost, test$Creditability) :
  요인(factors)에 대하여 의미있는 ‘-’가 아닙니다.

你知道为什么会出现这两个错误吗?提前感谢。

在您的代码中,问题在于(二进制)响应变量
可信性的定义。您将其声明为
因子
,但
gbm
需要一个数字响应变量

代码如下:

df <- read.csv("http://freakonometrics.free.fr/german_credit.csv", header=TRUE)

F <- c(2,4,5,7,8,9,10,11,12,13,15,16,17,18,19,20,21)
for(i in F) df[,i]=as.factor(df[,i])
str(df)
。。。代码的其余部分运行良好:

library(caret)
set.seed(1000)
intrain <- createDataPartition(y=df$Creditability, p=0.7, list=FALSE)
train <- df[intrain, ]
test <- df[-intrain, ]

library("gbm")
df_boosting <- gbm(Creditability~., distribution = "bernoulli", 
       n.trees=100, verbose=TRUE, interaction.depth=4,
       shrinkage=0.01, data=train)
par(mar=c(3,14,1,1))
summary(df_boosting, las=2)
库(插入符号)
种子集(1000)

为什么我应该改变可信性变量的类型?这是一个由0和1组成的因子类型变量。还有,有没有一种方法可以获得%形式的精度而不是MSE?或者MSE是测量精度的唯一方法@신익수 我将
Creditability
从因子更改为数值,只是因为这是
gbm
的要求。我没有考虑过用于计算<代码> GBM < /代码>的预测性能的方法。无论如何,在这种情况下,MSE不是合适的方法。例如,我建议使用基于ROC曲线的方法。@Macro Sandri那么,要在r中执行gbm,是否必须将目标变量(因变量)更改为数字?不是类别??但是,数据与分类有关,而不是回归。使用选项
distribution=“bernoulli”
gbm
知道响应变量需要作为二元分类因子处理。
'data.frame':   1000 obs. of  21 variables:
 $ Creditability                    : int  1 1 1 1 1 1 1 1 1 1 ...
 $ Account.Balance                  : Factor w/ 4 levels "1","2","3","4": 1 1 2 1 1 1 1 1 4 2 ...
 $ Duration.of.Credit..month.       : int  18 9 12 12 12 10 8 6 18 24 ...
 $ Payment.Status.of.Previous.Credit: Factor w/ 5 levels "0","1","2","3",..: 5 5 3 5 5 5 5 5 5 3 ...
 $ Purpose                          : Factor w/ 10 levels "0","1","2","3",..: 3 1 9 1 1 1 1 1 4 4 ...
 ...
library(caret)
set.seed(1000)
intrain <- createDataPartition(y=df$Creditability, p=0.7, list=FALSE)
train <- df[intrain, ]
test <- df[-intrain, ]

library("gbm")
df_boosting <- gbm(Creditability~., distribution = "bernoulli", 
       n.trees=100, verbose=TRUE, interaction.depth=4,
       shrinkage=0.01, data=train)
par(mar=c(3,14,1,1))
summary(df_boosting, las=2)
##########
                                                                var    rel.inf
Account.Balance                                     Account.Balance 36.8578980
Credit.Amount                                         Credit.Amount 12.0691120
Duration.of.Credit..month.               Duration.of.Credit..month. 10.5359895
Purpose                                                     Purpose 10.2691646
Payment.Status.of.Previous.Credit Payment.Status.of.Previous.Credit  9.1296524
Value.Savings.Stocks                           Value.Savings.Stocks  4.9620662
Instalment.per.cent                             Instalment.per.cent  3.3124252
...
##########

yhat.boost <- predict(df_boosting , newdata=test, n.trees=100)
mean((yhat.boost-test$Creditability)^2) 

[1] 0.2719788