插入符号训练svmRadial和kNN花费的时间太长-有没有办法提高训练()的性能

插入符号训练svmRadial和kNN花费的时间太长-有没有办法提高训练()的性能,r,svm,r-caret,knn,R,Svm,R Caret,Knn,我有一个使用插入符号功能的模型,我正在使用它来比较几个模型。我注意到在我选择的数据集上运行svmRadial和kNN需要一些时间 数据集:原始数据集是人口普查收入数据集,但我将其缩小,所得数据如下: 'data.frame': 32561 obs. of 13 variables: $ age : int 39 50 38 53 28 37 49 52 31 42 ... $ fnlwgt : int 77516 83311 215646

我有一个使用插入符号功能的模型,我正在使用它来比较几个模型。我注意到在我选择的数据集上运行svmRadial和kNN需要一些时间

数据集:原始数据集是人口普查收入数据集,但我将其缩小,所得数据如下:

'data.frame':   32561 obs. of  13 variables:
 $ age             : int  39 50 38 53 28 37 49 52 31 42 ...
 $ fnlwgt          : int  77516 83311 215646 234721 338409 284582 160187 209642 45781 159449 ...
 $ educationnum    : int  13 13 9 7 13 14 5 9 14 13 ...
 $ maritalstatus   : Factor w/ 7 levels "Divorced","Married-AF-spouse",..: 5 3 1 3 3 3 4 3 5 3 ...
 $ occupation      : Factor w/ 15 levels "?","Adm-clerical",..: 2 5 7 7 11 5 9 5 11 5 ...
 $ race            : Factor w/ 5 levels "Amer-Indian-Eskimo",..: 5 5 5 3 3 5 3 5 5 5 ...
 $ sex             : Factor w/ 2 levels "Female","Male": 2 2 2 2 1 1 1 2 1 2 ...
 $ hoursperweek    : int  40 13 40 40 40 40 16 45 50 40 ...
 $ response        : Factor w/ 2 levels "<=50K",">50K": 1 1 1 1 1 1 1 2 2 2 ...
 $ cntrymap        : Factor w/ 9 levels "British-Commonwealth",..: 9 9 9 9 6 9 5 9 9 9 ...
 $ relationship_new: Factor w/ 5 levels "Not-in-family",..: 1 4 1 4 4 4 1 4 1 4 ...
 $ workclass_new   : Factor w/ 8 levels "?","Federal-gov",..: 8 7 5 5 5 5 5 7 5 5 ...
 $ capitalgainloss : int  2174 0 0 0 0 0 0 0 14084 5178 ...
“数据帧”:32561 obs。在13个变量中:
$age:int 39503853283749523142。。。
$fnlwgt:int 77516 83311 215646 234721 338409 284582 160187 209642 45781 159449。。。
$educationnum:int 13 9 7 13 14 5 9 14 13。。。
$maritalstatus:系数w/7级“离婚”、“已婚配偶”…:5 3 1 3 4 5 3。。。
$职业:系数w/15级“?”,“行政文书”,..:2 5 7 11 5。。。
$race:Factor w/5级“美国印第安爱斯基摩人”…:5。。。
$性别:系数w/2级“女性”,“男性”:2 2 1 1 2。。。
每周$Hours:int 40 13 40 40 40 40 16 45 50 40。。。
$response:系数w/2级“50K”:1 1 2。。。
$cntrymap:系数w/9级“英联邦”,9:9 9。。。
$relationship_new:系数w/5级“不在家里”…:14。。。
$workclass_新:系数w/8级“?”,“联邦政府”,..:8 7 5 5 5 5。。。
$capitalgainloss:int 2174 0 0 0 14084 5178。。。
数据集中仍然有很多因素,我不确定是否应该对这些因素进行预处理或重新设计,以提高train()调用的性能

运行kNN仍然需要相当长的时间,但不会比70/30数据上的svmRadial长:

adult.kNN <- train(response~., data=adultFile, method="knn", metric=metric, preProc=preProc, trControl=control, tuneLength=10)

adult.svmRadial <- train(response~., data=adultTraining, method="svmRadial", metric=metric, preProc=c("center", "scale"), 
                          trControl=control, tuneLength = 5)
maintal.kNN
data_url <- c("https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data")
download.file(url = data_url, destfile = "adult.data")
fullData <- read.csv("adult.data", sep = ',', header = FALSE,strip.white = TRUE)
#fullData <- read.csv("http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data", header=F,strip.white=TRUE)
names(fullData) <- c("age", "workclass", "fnlwgt", "education", "educationnum", "maritalstatus", "occupation", "relationship", "race", "sex", "capitalgain", "capitalloss", "hoursperweek", "nativecountry", "response")

prepInputFile2 <- sqldf("select *,
      case 
        when   nativecountry == 'United-States' then 'United-States'
        when   nativecountry == 'China'    OR nativecountry == 'Hong' OR nativecountry == 'Taiwan' then 'China'
        when   nativecountry == 'Cambodia' OR nativecountry == 'Laos' OR nativecountry == 'Philippines' 
            OR nativecountry == 'Thailand' OR nativecountry == 'Vietnam' then 'SoutEast-Asia'
        when   nativecountry == 'Canada'   OR nativecountry == 'England' OR nativecountry == 'India' OR nativecountry == 'Ireland'
            OR nativecountry == 'Scotland' then 'British-Commonwealth'
        when   nativecountry == 'Columbia' OR nativecountry == 'El-Salvador' OR nativecountry == 'Ecuador' OR nativecountry == 'Peru' 
        then 'South-America'
        when   nativecountry == 'Dominican-Republic' OR nativecountry == 'Guatemala' OR nativecountry == 'Haiti'
            OR nativecountry == 'Honduras' OR nativecountry == 'Jamaica' OR nativecountry == 'Mexico' OR nativecountry =='Nicaragua'
            OR nativecountry == 'Outlying-US(Guam-USVI-etc)' OR nativecountry == 'Puerto-Rico' OR nativecountry =='Trinadad&Tobago' 
        then 'Latin-America'
        when   nativecountry == 'France' OR nativecountry == 'Germany' OR nativecountry == 'Holand-Netherlands' 
            OR nativecountry == 'Italy' then 'Euro-1'
        when   nativecountry == 'Yugoslavia' OR nativecountry == 'Greece' OR nativecountry == 'Hungary' OR nativecountry == 'Poland'
            OR nativecountry == 'Portugal'   OR nativecountry == 'South' then 'Euro-2'
        when   nativecountry == 'Cuba'       OR nativecountry == 'Iran' OR nativecountry == 'Japan' OR nativecountry == '?' then 'Other'
        else 'Undetermined'
      end as cntrymap,
      case
        when relationship == 'Husband' OR relationship == 'Wife' then 'Spouse' else relationship
      end as relationship_new,
      case 
        when workclass == 'Without-pay' then 'Never-worked' else workclass
      end as workclass_new,
      capitalgain - capitalloss as capitalgainloss
    from fullData
      ")
prepInputFile2$cntrymap <- as.factor(prepInputFile2$cntrymap)
prepInputFile2$workclass_new <- as.factor(prepInputFile2$workclass_new)
prepInputFile2$relationship_new <- as.factor(prepInputFile2$relationship_new)
dropColNames = c('education','capitalgain','capitalloss', 'workclass','relationship','nativecountry')
prepInputFile2 <- prepInputFile2[ , !(names(prepInputFile2) %in% dropColNames)]