Machine learning R-caret:如何使用类权重和下采样来处理类不平衡问题?

Machine learning R-caret:如何使用类权重和下采样来处理类不平衡问题?,machine-learning,r-caret,imbalanced-data,Machine Learning,R Caret,Imbalanced Data,我有一个非常不平衡的数据集。为了解决这个问题,我分别尝试了不同的类不平衡技术:下采样,类权重,阈值调整。其中,阈值调整的效果最差。单独使用downSample或单独使用类权重,我并没有获得足够好的结果:要么有太多的误报,要么有太多的误报。所以我想结合这两种技术。以下是我所累的: # produce some re-producible imbalanced data set.seed(12345) y <- as.factor(sample(c("M", "F"),

我有一个非常不平衡的数据集。为了解决这个问题,我分别尝试了不同的类不平衡技术:下采样类权重阈值调整。其中,阈值调整的效果最差。单独使用downSample或单独使用类权重,我并没有获得足够好的结果:要么有太多的误报,要么有太多的误报。所以我想结合这两种技术。以下是我所累的:

# produce some re-producible imbalanced data
set.seed(12345)
y <- as.factor(sample(c("M", "F"),
                      prob = c(0.1, 0.9),
                      size = 10000,
                      replace = TRUE))


x <- rnorm(10000)


DATA <- data.frame(y = as.factor(y), x)

set.seed(12345)
folds <- createFolds(dataSet$y, k = 10, 
                     list = TRUE, returnTrain = TRUE)

# class weights 
k <- 0.5
classWeights <- ifelse(DATA$y == "M",
                       (1/table(DATA$y)[1]) * k,
                       (1/table(DATA$y)[2]) * (1-k))
它工作正常,没有错误。但是当我把trainControl的采样参数添加为

# train parameters
set.seed(12345)
traincontrol <- trainControl(method = "loocv", # resampling method
                             number = 10,
                             index = folds,
                             classProbs = TRUE, 
                             summaryFunction = twoClassSummary,
                             savePredictions = TRUE,
                             sampling = "down"
                             )

fitModel <- train(y ~ .,
                  data = DATA, 
                  trControl = traincontrol,
                  method = algorithm,
                  metric = "ROC",
                  weights = classWeights,
                  )
是否有办法在插入符号中执行此操作?非常感谢

# train parameters
set.seed(12345)
traincontrol <- trainControl(method = "loocv", # resampling method
                             number = 10,
                             index = folds,
                             classProbs = TRUE, 
                             summaryFunction = twoClassSummary,
                             savePredictions = TRUE,
                             sampling = "down"
                             )

fitModel <- train(y ~ .,
                  data = DATA, 
                  trControl = traincontrol,
                  method = algorithm,
                  metric = "ROC",
                  weights = classWeights,
                  )
Error in model.frame.default(formula = .outcome ~ ., data = list(x = c(-0.0640913631047556,  : 
  variable lengths differ (found for '(weights)')
In addition: There were 11 warnings (use warnings() to see them)
Timing stopped at: 0.112 0.001 0.115