R 插入符号序列预处理中的相关截止_R_Correlation_R Caret_Preprocessor

R 插入符号序列预处理中的相关截止

R 插入符号序列预处理中的相关截止,r,correlation,r-caret,preprocessor,R,Correlation,R Caret,Preprocessor,我正在用r中的插入符号包构建一个C5.0模型 control <- trainControl(method = "repeatedcv", number = 10, repeats = 3, classProbs = TRUE, sampling = 'smote', returnRe

我正在用r中的插入符号包构建一个C5.0模型

control <- trainControl(method = "repeatedcv", 
                    number = 10, 
                    repeats = 3, 
                    classProbs = TRUE, 
                    sampling = 'smote',
                    returnResamp="all",
                    summaryFunction = twoClassSummary)

grid <- expand.grid(.winnow = c(FALSE, TRUE), 
                 .trials = c(1, 5,10,15,20,25,30,40,45,50), 
                 .model= c("tree"),
                 .splits=c(2,5,10,15,20,25,50))

c5_model <- train(label ~ .,
              data = train,
              trControl = control, 
              method = c5info,
              tuneGrid = grid, 
              preProcess = c("center", "scale", "nzv","corr"),
              verbose = FALSE)

control您可以在trainControl
中指定预处理选项：
library(caret)
library(mlbench) #for the data
data(Sonar)

ctrl <-trainControl(method = "repeatedcv", 
                    number = 10, 
                    repeats = 3, 
                    classProbs = TRUE, 
                    sampling = 'smote',
                    returnResamp="all",
                    summaryFunction = twoClassSummary,
                    preProcOptions = list(cutoff = 0.75)) # all go in this list

看起来很有效
同样，您可以传递任何其他预处理选项：
?caret::preProcess

要检查所有这些
初始问题已解决，但当模型运行时，它给出了一个错误“findCorrelation_fast中的错误（x=x，cutoff=cutoff，verbose=verbose）：相关矩阵缺少一些值。“我如何指示函数进行成对完全相关？数据中是否有一些NA
？如果是，则尝试删除它们或将其归罪。问题是否仍然存在？将na.action=na.pass传递到列车中是否会解决问题？我确信数据中没有na。。。奇怪？在没有数据的情况下很难排除这种行为。我要做的是试着对数据运行预处理和“corr”
，看看它是否有效（在列车
）-如果没有，我将尝试运行R函数cor-如果仍然不工作，我将尝试找出原因-如果不能，我将在这里发布另一个问题，其中包含可以重现问题的数据子集。
ctrl2 <-trainControl(method = "repeatedcv", 
                    number = 10, 
                    repeats = 3, 
                    classProbs = TRUE, 
                    sampling = 'smote',
                    returnResamp="all",
                    summaryFunction = twoClassSummary,
                    preProcOptions = list(cutoff = 0.6))

fit_model2 <- train(Class ~ .,
                   data = Sonar,
                   trControl = ctrl2, 
                   metric = "ROC",
                   method = "ranger",
                   tuneGrid = grid,
                   preProcess = c("center", "scale", "nzv","corr"),
                   verbose = FALSE)

fit_model2$preProcess
#output
Created from 679 samples and 60 variables

Pre-processing:
  - centered (23)
  - ignored (0)
  - removed (37)
  - scaled (23)

fit_model3$preProcess
#output
Created from 679 samples and 60 variables

Pre-processing:
  - centered (55)
  - ignored (0)
  - removed (5)
  - scaled (55)

?caret::preProcess