Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/76.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/node.js/35.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 如何使用cohen';卡帕是谁?_R_Xgboost_Lightgbm - Fatal编程技术网

R 如何使用cohen';卡帕是谁?

R 如何使用cohen';卡帕是谁?,r,xgboost,lightgbm,R,Xgboost,Lightgbm,我定期在R中使用XGBoost,并希望在相同的数据上开始使用LightGBM。我的目标是使用科恩的kappa作为评估指标。然而,我不能正确地实现LightGBM-似乎没有学习发生。作为一个非常简单的示例,我将使用泰坦尼克号数据集 library(data.table) library(dplyr) library(caret) titanic <- fread("https://raw.githubusercontent.com/pcsanwald/kaggle-titanic/mast

我定期在R中使用XGBoost,并希望在相同的数据上开始使用LightGBM。我的目标是使用科恩的kappa作为评估指标。然而,我不能正确地实现LightGBM-似乎没有学习发生。作为一个非常简单的示例,我将使用泰坦尼克号数据集

library(data.table)
library(dplyr)
library(caret)

titanic <- fread("https://raw.githubusercontent.com/pcsanwald/kaggle-titanic/master/train.csv")

titanic_complete <- titanic %>%
   select(survived, pclass, sex, age, sibsp, parch, fare, embarked) %>% 
   mutate_if(is.character, as.factor) %>%
   mutate(survived = as.factor(survived)) %>% 
   na.omit()

train_class <- titanic_complete %>% 
   select(survived) %>% 
   pull()

train_numeric <- titanic_complete %>% 
   select_if(is.numeric) %>% 
   data.matrix()

ctrl <- trainControl(method = "none", search = "grid")

tune_grid_xgbTree <- expand.grid(
   nrounds = 700,
   eta = 0.1,
   max_depth = 3,
   gamma = 0,
   colsample_bytree = 0,
   min_child_weight = 1,
   subsample = 1)

 set.seed(512)
 fit_xgb <- train(
    x = train_numeric,
    y = train_class,
    tuneGrid = tune_grid_xgbTree,
    trControl = ctrl,
    method = "xgbTree",
    metric = "Kappa",
    verbose = TRUE)

 confusionMatrix(predict(fit_xgb, train_numeric), train_class)
库(data.table)
图书馆(dplyr)
图书馆(插入符号)
泰坦尼克号%
如果(is.character,as.factor)%%>
突变(存活=因子(存活))%>%
na.省略()
火车班%
选择(幸存)%>%
拉
列车数量%
如果(是数值)%>%,请选择
data.matrix()
控制键
无学习发生,算法输出“无正增益进一步分裂,最佳增益:-inf”

这是因为它们是为更大的数据集配置的。上面示例中的培训数据集只有714行。为了解决这个问题,我建议将LightGBM的参数设置为允许较小叶节点的值,并限制叶的数量而不是深度

列表(
“叶子中的最小数据”=3
,“最大深度”=-1
,“num_leaves”=8
)
Kappa=0

我相信你执行科恩的卡帕是错误的。
e1071::classAgreement()
的输入应该是一个计数表(一个混淆矩阵),而
preds
的形式是预测概率。我认为这个实现是正确的,基于

lgb.kappa%
data.matrix()
lgb.kappa
library(lightgbm)
lgb.kappa <- function(preds, y) {
   label <- getinfo(y, "label")
   k <- unlist(e1071::classAgreement(table(label, preds)))["kappa"]
   return(list(name = "kappa", value = as.numeric(k), higher_better = TRUE))
 }

X_train <- titanic_complete %>% select(-survived) %>% data.matrix()
y_train <- titanic_complete %>% select(survived) %>% data.matrix()
y_train <- y_train - 1

dtrain <- lgb.Dataset(data = X_train, label = y_train)
fit_lgbm <- lgb.train(data = dtrain,
                  objective = "binary",
                  learning_rate = 0.1,
                  nrounds = 700,
                  colsample_bytree = 0,
                  eval = lgb.kappa,
                  min_child_weight = 1,
                  max_depth = 3)