有点不对劲;缺少所有ROC度量值:
我正在用插入符号套装训练一名R型模特:有点不对劲;缺少所有ROC度量值:,r,r-caret,R,R Caret,我正在用插入符号套装训练一名R型模特: ctrl <- trainControl(method = "repeatedcv", repeats = 3, summaryFunction = twoClassSummary) logitBoostFit <- train(LoanStatus~., credit, method = "LogitBoost", family=binomial, preProcess=c("center", "scale", "pca"),
ctrl <- trainControl(method = "repeatedcv", repeats = 3, summaryFunction = twoClassSummary)
logitBoostFit <- train(LoanStatus~., credit, method = "LogitBoost", family=binomial, preProcess=c("center", "scale", "pca"),
trControl = ctrl)
我安装了pROC包:
install.packages("pROC", repos="http://cran.rstudio.com/")
library(pROC)
Type 'citation("pROC")' for a citation.
Attaching package: ‘pROC’
The following objects are masked from ‘package:stats’:
cov, smooth, var
以下是数据:
str(credit)
'data.frame': 8580 obs. of 45 variables:
$ ListingCategory : int 1 7 3 1 1 7 1 1 1 1 ...
$ IncomeRange : int 3 4 6 4 4 3 3 4 3 3 ...
$ StatedMonthlyIncome : num 2583 4326 10500 4167 5667 ...
$ IncomeVerifiable : logi TRUE TRUE TRUE FALSE TRUE TRUE ...
$ DTIwProsperLoan : num 1.8e-01 2.0e-01 1.7e-01 1.0e+06 1.8e-01 4.4e-01 2.2e-01 2.0e-01 2.0e-01 3.1e-01 ...
$ EmploymentStatusDescription: Factor w/ 7 levels "Employed","Full-time",..: 1 4 1 7 1 1 1 1 1 1 ...
$ Occupation : Factor w/ 65 levels "","Accountant/CPA",..: 37 37 20 14 43 58 48 37 37 37 ...
$ MonthsEmployed : int 4 44 159 67 26 16 209 147 24 9 ...
$ BorrowerState : Factor w/ 48 levels "AK","AL","AR",..: 22 32 5 5 14 28 4 10 10 34 ...
$ BorrowerCity : Factor w/ 3089 levels "AARONSBURG","ABERDEEN",..: 1737 3059 2488 654 482 719 895 1699 2747 1903 ...
$ BorrowerMetropolitanArea : Factor w/ 1 level "(Not Implemented)": 1 1 1 1 1 1 1 1 1 1 ...
$ LenderIndicator : int 0 0 0 1 0 0 0 0 1 0 ...
$ GroupIndicator : logi FALSE FALSE FALSE TRUE FALSE FALSE ...
$ GroupName : Factor w/ 83 levels "","00 Used Car Loans",..: 1 1 1 47 1 1 1 1 1 1 ...
$ ChannelCode : int 90000 90000 90000 80000 40000 40000 90000 90000 80000 90000 ...
$ AmountParticipation : int 0 0 0 0 0 0 0 0 0 0 ...
$ MonthlyDebt : int 247 785 1631 817 644 1524 427 817 654 749 ...
$ CurrentDelinquencies : int 0 0 0 0 0 0 0 1 0 1 ...
$ DelinquenciesLast7Years : int 0 10 0 0 0 0 0 0 0 0 ...
$ PublicRecordsLast10Years : int 0 1 0 0 0 0 1 0 1 0 ...
$ PublicRecordsLast12Months : int 0 0 0 0 0 0 0 0 0 0 ...
$ FirstRecordedCreditLine : Factor w/ 4719 levels "1/1/00 0:00",..: 3032 2673 1197 2541 4698 4345 3150 925 4452 2358 ...
$ CreditLinesLast7Years : int 53 30 36 26 7 22 15 20 34 32 ...
$ InquiriesLast6Months : int 2 8 5 0 0 0 0 3 0 0 ...
$ AmountDelinquent : int 0 0 0 0 0 0 0 63 0 15 ...
$ CurrentCreditLines : int 10 10 18 10 4 11 6 10 7 8 ...
$ OpenCreditLines : int 9 10 15 8 3 8 5 7 7 8 ...
$ BankcardUtilization : num 0.26 0.69 0.94 0.69 0.81 0.38 0.55 0.24 0.03 0 ...
$ TotalOpenRevolvingAccounts : int 9 7 12 10 3 5 4 5 4 6 ...
$ InstallmentBalance : int 48648 14827 0 0 0 30916 0 21619 41340 15447 ...
$ RealEstateBalance : int 0 0 577745 0 0 0 191296 0 0 126039 ...
$ RevolvingBalance : int 5265 9967 94966 50511 37871 22463 19550 2436 1223 3236 ...
$ RealEstatePayment : int 0 0 4159 0 0 0 1303 0 0 1279 ...
$ RevolvingAvailablePercent : int 78 52 36 45 18 61 44 74 96 76 ...
$ TotalInquiries : int 8 11 15 2 0 0 1 7 1 1 ...
$ TotalTradeItems : int 53 30 36 26 7 22 15 20 34 32 ...
$ SatisfactoryAccounts : int 52 23 36 26 7 19 15 18 34 29 ...
$ NowDelinquentDerog : int 0 0 0 0 0 0 0 1 0 1 ...
$ WasDelinquentDerog : int 1 7 0 0 0 3 0 1 0 2 ...
$ OldestTradeOpenDate : int 5092001 5011977 12011984 4272000 9081993 9122000 6161987 11181999 9191990 4132000 ...
$ DelinquenciesOver30Days : int 0 6 0 0 0 13 0 2 0 2 ...
$ DelinquenciesOver60Days : int 0 4 0 0 0 0 0 0 0 1 ...
$ DelinquenciesOver90Days : int 0 10 0 0 0 0 0 0 0 0 ...
$ IsHomeowner : logi FALSE FALSE TRUE FALSE FALSE FALSE ...
$ LoanStatus : Factor w/ 2 levels "0","1": 2 1 1 2 2 2 2 2 2 1 .`..
摘要(学分)
列出类别收入范围说明每月收入可核实
最小值:0.000最小值:1.000最小值:0模式:逻辑第一区:1.000第一区:3.000第一区:3167假:784
中位数:2000中位数:4000中位数:4750真实值:7796
平均值:4.997平均值:4.089平均值:5755 NA's:0
第三区:7.000第三区:5.000第三区:7083
最大值:20.000最大值:7.000最大值:250000 DTIWloan就业状态说明就业月份
最小值:0.0雇用:7182最小值:-23.00
第一区:0.1全日制:416第一区:26.00
中位数:0.2未就业:122中位数:68.00
平均数:91609.4其他:475平均数:97.44
第三区:0.3兼职:7第三区:139.00
最大值:1000000.0退休:32最大值:755.00
自营职业者:346纳:5
借贷者状态借贷者指示器组指示器通道代码
CA:1056分钟:0.00000模式:逻辑分钟:40000
佛罗里达州:608第一区:0.00000假:8325第一区:80000
纽约:574中位数:0.00000真实:255中位数:80000
TX:532平均值:0.09196 NA:0平均值:77196
IL:443第三区:0.00000第三区:90000
GA:343最大值:1.00000最大值:90000
(其他):5024
过去7年的月度违约率 最小值:0.0最小值:0.0000最小值:0.000
第一区:364.0第一区:0.0000第一区:0.000
中位数:708.0中位数:0.0000中位数:0.000
平均值:885.5平均值:0.4119平均值:4.009
第三区:1205.2第三区:0.0000第三区:3.000
最大值:30213.0最大值:21.0000最大值:99.000 PublicRecordsLast10年PublicRecordsLast12个月CreditLinesLast7年 最小值:0.0000最小值:0.00000最小值:2.0
第一区:0.0000第一区:0.00000第一区:16.0
中位数:0.0000中位数:0.00000中位数:24.0
平均值:0.2809平均值:0.01364平均值:26.1
第三区:0.0000第三区:0.00000第三区:34.0
最大值:11.0000最大值:4.00000最大值:115.0 查询6个月拖欠的CurrentCreditLines OpenCreditLines金额 最小值:0.0000最小值:0最小值:0.000最小值:0.000
第一区:0.0000第一区:0第一区:5.000第一区:5.000
中位数:1.0000中位数:0中位数:9.000中位数:8.000
平均值:0.9994平均值:1195平均值:9.345平均值:8.306
第三区:1.0000第三区:0第三区:12.000第三区:11.000
最大值:15.0000最大值:179158最大值:54.000最大值:42.000 银行卡利用率合计OpenRevolvingAccounts分期付款余额 最小值:0.0000最小值:0.000最小值:0
第一区:0.2500第一区:3.000第一区:3338
中位数:0.5400中位数:6.000中位数:14453
平均值:0.5182平均值:6.441平均值:24900
第三区:0.7900第三区:9.000第三区:32238
最大值:2.2300最大值:44.000最大值:739371
NA's:328
房地产余额转帐余额房地产付款转帐可用百分比 分钟:0分钟:0分钟:0.0分钟:0.00
第一区:0第一区:2799第一区:0.0第一区:29.00
中位数:26154中位数:8784中位数:346.5中位数:52.00
平均值:109306平均值:19555平均值:830.5平均值:51.46
第三区:176542第三区:21110第三区:1382.2第三区:75.00
最大值:1938421最大值:695648最大值:13651.0最大值:100.00 TotalInquiries TotalTradeItems满意工厂帐户现在拖欠 最小值:0.00分钟:2.0分钟:1.00分钟:0.0000
第一区:2.00第一区:16.0第一区:14.00第一区:0.0000
中位数:3.00中位数:24.0中位数:21.00中位数:0.0000
平均值:3.91平均值:26.1平均值:23.34平均值:0.4119
第三节:5.00第三节:34.0第三节:30.25第三节:0.0000
最大值:36.00最大值:115.0最大值:113.00最大值:21.0000 逾期未付款逾期未付款超过30天 最小值:0.000最小值:1011957最小值:0.000
第一区:0.000第一区:4101996第一区:0.000
中位数:1.000中位数:7191993中位数:1.000
平均数:2.343平均数:6934230平均数:4.332
第三区:3.000第三区:10011990第三区:5.000
最大值:32.000最大值:12312004最大值:99.000 逾期超过60天逾期超过90天为业主贷款状态 最小值:0.000最小值:0.000模式:逻辑0:1518
第一区:0.000第一区:0.000假:4264 1:7062
中位数:0.000中位数:0.000真实值:4316
平均值:1.908平均值:4.009 NA's:0
第三区:2.000第三区:3.000
最大值:73.000最大值:99.000 我没有找到任何缺少的值:
try(na.fail(credit))
dput(head(credit,4))
structure(list(ListingCategory = c(1L, 7L, 3L, 1L), IncomeRange = c(3L,
4L, 6L, 4L), StatedMonthlyIncome = c(2583.3333, 4326, 10500,
4166.6667), IncomeVerifiable = c(TRUE, TRUE, TRUE, FALSE), DTIwProsperLoan = c(0.18,
0.2, 0.17, 1e+06), EmploymentStatusDescription = structure(c(1L,
4L, 1L, 7L), .Label = c("Employed", "Full-time", "Not employed",
"Other", "Part-time", "Retired", "Self-employed"), class = "factor"),
MonthsEmployed = c(4L, 44L, 159L, 67L), BorrowerState = structure(c(22L,
32L, 5L, 5L), .Label = c("AK", "AL", "AR", "AZ", "CA", "CO",
"CT", "DC", "DE", "FL", "GA", "HI", "ID", "IL", "IN", "KS",
"KY", "LA", "MA", "MD", "MI", "MN", "MO", "MS", "MT", "NC",
"NE", "NH", "NJ", "NM", "NV", "NY", "OH", "OK", "OR", "PA",
"RI", "SC", "SD", "TN", "TX", "UT", "VA", "VT", "WA", "WI",
"WV", "WY"), class = "factor"), LenderIndicator = c(0L, 0L,
0L, 1L), GroupIndicator = c(FALSE, FALSE, FALSE, TRUE), ChannelCode = c(90000L,
90000L, 90000L, 80000L), MonthlyDebt = c(247L, 785L, 1631L,
817L), CurrentDelinquencies = c(0L, 0L, 0L, 0L), DelinquenciesLast7Years = c(0L,
10L, 0L, 0L), PublicRecordsLast10Years = c(0L, 1L, 0L, 0L
), PublicRecordsLast12Months = c(0L, 0L, 0L, 0L), CreditLinesLast7Years = c(53L,
30L, 36L, 26L), InquiriesLast6Months = c(2L, 8L, 5L, 0L),
AmountDelinquent = c(0L, 0L, 0L, 0L), CurrentCreditLines = c(10L,
10L, 18L, 10L), OpenCreditLines = c(9L, 10L, 15L, 8L), BankcardUtilization = c(0.26,
0.69, 0.94, 0.69), TotalOpenRevolvingAccounts = c(9L, 7L,
12L, 10L), InstallmentBalance = c(48648L, 14827L, 0L, 0L),
RealEstateBalance = c(0L, 0L, 577745L, 0L), RevolvingBalance = c(5265L,
9967L, 94966L, 50511L), RealEstatePayment = c(0L, 0L, 4159L,
0L), RevolvingAvailablePercent = c(78L, 52L, 36L, 45L), TotalInquiries = c(8L,
11L, 15L, 2L), TotalTradeItems = c(53L, 30L, 36L, 26L), SatisfactoryAccounts = c(52L,
23L, 36L, 26L), NowDelinquentDerog = c(0L, 0L, 0L, 0L), WasDelinquentDerog = c(1L,
7L, 0L, 0L), OldestTradeOpenDate = c(5092001L, 5011977L,
12011984L, 4272000L), DelinquenciesOver30Days = c(0L, 6L,
0L, 0L), DelinquenciesOver60Days = c(0L, 4L, 0L, 0L), DelinquenciesOver90Days = c(0L,
10L, 0L, 0L), IsHomeowner = c(FALSE, FALSE, TRUE, FALSE),
LoanStatus = structure(c(2L, 1L, 1L, 2L), .Label = c("0",
"1"), class = "factor")), .Names = c("ListingCategory", "IncomeRange",
"StatedMonthlyIncome", "IncomeVerifiable", "DTIwProsperLoan",
"EmploymentStatusDescription", "MonthsEmployed", "BorrowerState",
"LenderIndicator", "GroupIndicator", "ChannelCode", "MonthlyDebt",
"CurrentDelinquencies", "DelinquenciesLast7Years", "PublicRecordsLast10Years",
"PublicRecordsLast12Months", "CreditLinesLast7Years", "InquiriesLast6Months",
"AmountDelinquent", "CurrentCreditLines", "OpenCreditLines",
"BankcardUtilization", "TotalOpenRevolvingAccounts", "InstallmentBalance",
"RealEstateBalance", "RevolvingBalance", "RealEstatePayment",
"RevolvingAvailablePercent", "TotalInquiries", "TotalTradeItems",
"SatisfactoryAccounts", "NowDelinquentDerog", "WasDelinquentDerog",
"OldestTradeOpenDate", "DelinquenciesOver30Days", "DelinquenciesOver60Days",
"DelinquenciesOver90Days", "IsHomeowner", "LoanStatus"), row.names = c(NA,
4L), class = "data.frame")
有什么问题吗
Warning message:
In train.default(x, y, weights = w, ...): The metric "Accuracy" was not in the result set. ROC will be used instead.
# weights: 72 (71 variable)
initial value 5144.538374
iter 10 value 3540.667624
iter 20 value 3329.692768
iter 30 value 3279.191024
iter 40 value 3264.926986
iter 50 value 3259.276647
iter 60 value 3259.056261
final value 3259.032668
converged
# weights: 72 (71 variable)
initial value 5144.538374
iter 10 value 3540.774666
iter 20 value 3330.016829
iter 30 value 3279.545595
iter 40 value 3265.384385
iter 50 value 3259.499032
iter 60 value 3259.353010
final value 3259.342601
converged
# weights: 72 (71 variable)
initial value 5144.538374
iter 10 value 3540.667731
iter 20 value 3329.693092
iter 30 value 3279.191379
iter 40 value 3264.927427
iter 50 value 3259.276899
iter 60 value 3259.056561
final value 3259.032978
converged
# weights: 72 (71 variable)
initial value 5144.538374
iter 10 value 3528.401458
iter 20 value 3314.932958
iter 30 value 3264.117072
iter 40 value 3253.780051
iter 50 value 3253.368959
iter 60 value 3253.359047
final value 3253.358819
converged
# weights: 72 (71 variable)
initial value 5144.538374
iter 10 value 3528.508505
iter 20 value 3315.134599
iter 30 value 3265.021404
iter 40 value 3255.739021
iter 50 value 3253.817833
iter 60 value 3253.697180
final value 3253.671003
converged
# weights: 72 (71 variable)
initial value 5144.538374
iter 10 value 3528.401565
iter 20 value 3314.933160
iter 30 value 3264.117768
iter 40 value 3253.780539
iter 50 value 3253.369030
iter 60 value 3253.359358
final value 3253.359133
converged
# weights: 71 (70 variable)
initial value 5145.231521
iter 10 value 4680.326236
iter 20 value 4672.506024
iter 30 value 3662.998233
iter 40 value 3310.207744
iter 50 value 3252.983656
iter 60 value 3250.400275
iter 70 value 3250.339216
final value 3250.332646
converged
…#权重:72(71变量)
初始值5144.538374
iter 10数值4661.569290
iter 20数值4652.246624
iter 30值3715.472355
iter 40值3484.096833
iter 50值3254.247424
iter 60值3248.93184
Warning message:
In train.default(x, y, weights = w, ...): The metric "Accuracy" was not in the result set. ROC will be used instead.
# weights: 72 (71 variable)
initial value 5144.538374
iter 10 value 3540.667624
iter 20 value 3329.692768
iter 30 value 3279.191024
iter 40 value 3264.926986
iter 50 value 3259.276647
iter 60 value 3259.056261
final value 3259.032668
converged
# weights: 72 (71 variable)
initial value 5144.538374
iter 10 value 3540.774666
iter 20 value 3330.016829
iter 30 value 3279.545595
iter 40 value 3265.384385
iter 50 value 3259.499032
iter 60 value 3259.353010
final value 3259.342601
converged
# weights: 72 (71 variable)
initial value 5144.538374
iter 10 value 3540.667731
iter 20 value 3329.693092
iter 30 value 3279.191379
iter 40 value 3264.927427
iter 50 value 3259.276899
iter 60 value 3259.056561
final value 3259.032978
converged
# weights: 72 (71 variable)
initial value 5144.538374
iter 10 value 3528.401458
iter 20 value 3314.932958
iter 30 value 3264.117072
iter 40 value 3253.780051
iter 50 value 3253.368959
iter 60 value 3253.359047
final value 3253.358819
converged
# weights: 72 (71 variable)
initial value 5144.538374
iter 10 value 3528.508505
iter 20 value 3315.134599
iter 30 value 3265.021404
iter 40 value 3255.739021
iter 50 value 3253.817833
iter 60 value 3253.697180
final value 3253.671003
converged
# weights: 72 (71 variable)
initial value 5144.538374
iter 10 value 3528.401565
iter 20 value 3314.933160
iter 30 value 3264.117768
iter 40 value 3253.780539
iter 50 value 3253.369030
iter 60 value 3253.359358
final value 3253.359133
converged
# weights: 71 (70 variable)
initial value 5145.231521
iter 10 value 4680.326236
iter 20 value 4672.506024
iter 30 value 3662.998233
iter 40 value 3310.207744
iter 50 value 3252.983656
iter 60 value 3250.400275
iter 70 value 3250.339216
final value 3250.332646
converged
ctrl <- trainControl(method = "cv", summaryFunction = twoClassSummary)
multinomSummaryFit <- train(LoanStatus~., credit, method = "multinom", family=binomial,
trControl = ctrl)
Warning message:
In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, : There were missing values in resampled performance measures.
Something is wrong; all the ROC metric values are missing:
ROC Sens Spec
Min. : NA Min. :0.01919 Min. :0.9941
1st Qu.: NA 1st Qu.:0.01988 1st Qu.:0.9942
Median : NA Median :0.02056 Median :0.9943
Mean :NaN Mean :0.02011 Mean :0.9943
3rd Qu.: NA 3rd Qu.:0.02056 3rd Qu.:0.9943
Max. : NA Max. :0.02057 Max. :0.9944
NA's :3
Error in train.default(x, y, weights = w, ...): Stopping
MonthsEmployed
Min. :-23.00
1st Qu.: 26.00
Median : 68.00
Mean : 97.44
3rd Qu.:139.00
Max. :755.00
NA's :5
InstallmentBalance
Min. : 0
1st Qu.: 3338
Median : 14453
Mean : 24900
3rd Qu.: 32238
Max. :739371
NA's :328
ctrl <- trainControl(method = "repeatedcv",
repeats = 3,
classProbs = TRUE,
summaryFunction = twoClassSummary) .
multinomSummaryFit <- train(LoanStatus~.,
data = credit,
method = "multinom",
family=binomial,
metric = "ROC",
trControl = ctrl)
#let's say your test datafrme is called test
mymodel_pred <- predict(multinomSummaryFit, test[, names(test) != "LoanStatus"])
confusionMatrix(data = mymodel_pred,
reference = test$LoanStatus,
positive = "Default")
mymodel_pred <- predict(multinomSummaryFit, test)
trainingRows <- createDataPartition(credit$LoanStatus, p = .70, list= FALSE)
train <- credit[trainingRows, ]
test <- credit[-trainingRows, ]