如何重现';的$resample和$result;列车';插入符号中的对象?
我不熟悉神奇的插入符号包,尝试使用重采样方法='timeslice'从lm模型的train()输出中复制一些对象如何重现';的$resample和$result;列车';插入符号中的对象?,r,regression,r-caret,R,Regression,R Caret,我不熟悉神奇的插入符号包,尝试使用重采样方法='timeslice'从lm模型的train()输出中复制一些对象 为什么我的示例中的$result$RMSE和$result$Rsquared不同 从函数defaultSummary的输出($pred$pred,$pred$obs) 用什么数据计算RMSE、Rsquared、MAE(单位:美元重采样) require(caret) require(doParallel) no_cores <- detectCores() - 1 cls
require(caret)
require(doParallel)
no_cores <- detectCores() - 1
cls = makeCluster(no_cores)
registerDoParallel(cls)
data(economics)
#str(economics)
ec.data <- as.data.frame(economics[,-1]) #drop 'date' column
#head(ec.data)
#trainControl() with parallel processing and 1 step forecasts by TimeSlices------------------------
set.seed(123)
samplesCount = nrow(ec.data)
initialWindow = 10
h = 1
s = 0
M = 1 # no of models that are evaluated during each resample (tuning parameters)
#seeds
resamplesCount = length(createTimeSlices(1:samplesCount, initialWindow, horizon = h, fixedWindow = TRUE, skip = s)$test)
seeds <- vector(mode = "list", length = resamplesCount + 1) # length = B+1, B = number of resamples
for(i in 1:resamplesCount) seeds[[i]] <- sample.int(1000, M) # The first B elements of the list should be vectors of integers of >= length M where M is the number of models being evaluated for each resample.
seeds[[(resamplesCount+1)]] <- sample.int(1000, 1) # The last element of the list only needs to be a single integer (for the final model)
trainCtrl.ec <- trainControl(
method = "timeslice", initialWindow = initialWindow, horizon = h, skip = s, # data splitting
returnResamp = "all",
savePredictions = "all",
seeds = seeds,
allowParallel = TRUE)
lm.fit.ec <- train( unemploy ~ ., data = ec.data,
method = "lm",
trControl = trainCtrl.ec)
lm.fit.ec
head(lm.fit.ec$resample)
> head(lm.fit.ec$resample)
RMSE Rsquared MAE intercept Resample
1 16.33273 NA 16.33273 TRUE Training010
2 232.16184 NA 232.16184 TRUE Training011
3 197.65143 NA 197.65143 TRUE Training012
4 393.29469 NA 393.29469 TRUE Training013
5 129.99157 NA 129.99157 TRUE Training014
6 60.95649 NA 60.95649 TRUE Training015
> head(lm.fit.ec$resample)
RMSE Rsquared MAE intercept Resample
1 16.33273 NA 16.33273 TRUE Training010
2 232.16184 NA 232.16184 TRUE Training011
3 197.65143 NA 197.65143 TRUE Training012
4 393.29469 NA 393.29469 TRUE Training013
5 129.99157 NA 129.99157 TRUE Training014
6 60.95649 NA 60.95649 TRUE Training015
first_holdout <- subset(lm.fit.ec$pred, Resample == "Training010")
first_holdout
> first_holdout
pred obs rowIndex intercept Resample
1 2756.333 2740 11 TRUE Training010 # only 1 row since 1 step forecast horizon
# Calculate RMSE, Rsquared and MAE for the holdout set
postResample(first_holdout$pred, first_holdout$obs)
> postResample(first_holdout$pred, first_holdout$obs)
RMSE Rsquared MAE
16.33273 NA 16.33273
为什么RMSE和RSQUARE的输出与使用defaultSummary()计算时不一样
会话信息:
> sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=Swedish_Sweden.1252 LC_CTYPE=Swedish_Sweden.1252 LC_MONETARY=Swedish_Sweden.1252
[4] LC_NUMERIC=C LC_TIME=Swedish_Sweden.1252
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] fpp_0.5 tseries_0.10-42 lmtest_0.9-35 zoo_1.8-0
[5] expsmooth_2.3 fma_2.3 forecast_8.2 mlbench_2.1-1
[9] spikeslab_1.1.5 randomForest_4.6-12 lars_1.2 doParallel_1.0.11
[13] iterators_1.0.8 foreach_1.4.3 caret_6.0-77.9000 ggplot2_2.2.1
[17] lattice_0.20-35
我在这里找到了问题的答案: 问题1。为什么我的示例中的$result$RMSE和$result$rsquare与函数defaultSummary($pred$pred,$pred$obs)的输出不同 A:列车输出计算为拒动者的平均值。在我的例子中:
# The output is the mean of $resample
mean(lm.fit.ec$resample$RMSE) # =250.072
mean(lm.fit.ec$resample$MAE) # =250.072
问题2。用什么数据计算RMSE、Rsquared、MAE(单位:美元重采样)
require(caret)
require(doParallel)
no_cores <- detectCores() - 1
cls = makeCluster(no_cores)
registerDoParallel(cls)
data(economics)
#str(economics)
ec.data <- as.data.frame(economics[,-1]) #drop 'date' column
#head(ec.data)
#trainControl() with parallel processing and 1 step forecasts by TimeSlices------------------------
set.seed(123)
samplesCount = nrow(ec.data)
initialWindow = 10
h = 1
s = 0
M = 1 # no of models that are evaluated during each resample (tuning parameters)
#seeds
resamplesCount = length(createTimeSlices(1:samplesCount, initialWindow, horizon = h, fixedWindow = TRUE, skip = s)$test)
seeds <- vector(mode = "list", length = resamplesCount + 1) # length = B+1, B = number of resamples
for(i in 1:resamplesCount) seeds[[i]] <- sample.int(1000, M) # The first B elements of the list should be vectors of integers of >= length M where M is the number of models being evaluated for each resample.
seeds[[(resamplesCount+1)]] <- sample.int(1000, 1) # The last element of the list only needs to be a single integer (for the final model)
trainCtrl.ec <- trainControl(
method = "timeslice", initialWindow = initialWindow, horizon = h, skip = s, # data splitting
returnResamp = "all",
savePredictions = "all",
seeds = seeds,
allowParallel = TRUE)
lm.fit.ec <- train( unemploy ~ ., data = ec.data,
method = "lm",
trControl = trainCtrl.ec)
lm.fit.ec
head(lm.fit.ec$resample)
> head(lm.fit.ec$resample)
RMSE Rsquared MAE intercept Resample
1 16.33273 NA 16.33273 TRUE Training010
2 232.16184 NA 232.16184 TRUE Training011
3 197.65143 NA 197.65143 TRUE Training012
4 393.29469 NA 393.29469 TRUE Training013
5 129.99157 NA 129.99157 TRUE Training014
6 60.95649 NA 60.95649 TRUE Training015
> head(lm.fit.ec$resample)
RMSE Rsquared MAE intercept Resample
1 16.33273 NA 16.33273 TRUE Training010
2 232.16184 NA 232.16184 TRUE Training011
3 197.65143 NA 197.65143 TRUE Training012
4 393.29469 NA 393.29469 TRUE Training013
5 129.99157 NA 129.99157 TRUE Training014
6 60.95649 NA 60.95649 TRUE Training015
first_holdout <- subset(lm.fit.ec$pred, Resample == "Training010")
first_holdout
> first_holdout
pred obs rowIndex intercept Resample
1 2756.333 2740 11 TRUE Training010 # only 1 row since 1 step forecast horizon
# Calculate RMSE, Rsquared and MAE for the holdout set
postResample(first_holdout$pred, first_holdout$obs)
> postResample(first_holdout$pred, first_holdout$obs)
RMSE Rsquared MAE
16.33273 NA 16.33273
>头部(lm.fit.ec$重采样)
RMSE Rsquared MAE截距重采样
1 16.33273 NA 16.33273真实培训010
2 232.16184 NA 232.16184真实培训011
3 197.65143 NA 197.65143真实培训012
4 393.29469 NA 393.29469真实培训013
5 129.99157 NA 129.99157真实培训014
6 60.95649 NA 60.95649真实培训015
第一个坚持第一个坚持
pred obs行索引截取重采样
1 2756.333 2740 11真实培训010#自1步预测期后仅1行
#计算保持组的RMSE、Rsquared和MAE
重采样后(第一次坚持$pred,第一次坚持$obs)
>重采样后(第一次坚持$pred,第一次坚持$obs)
RMSE Rsquared MAE
16.33273 NA 16.33273
我在这里的困惑主要是因为Rsquared是NA。但由于预测层位为1步,所有保留样本只有一行,因此无法计算RSquare。我在这里找到了问题的答案: 问题1。为什么我的示例中的$result$RMSE和$result$rsquare与函数defaultSummary($pred$pred,$pred$obs)的输出不同 A:列车输出计算为拒动者的平均值。在我的例子中:
# The output is the mean of $resample
mean(lm.fit.ec$resample$RMSE) # =250.072
mean(lm.fit.ec$resample$MAE) # =250.072
问题2。用什么数据计算RMSE、Rsquared、MAE(单位:美元重采样)
require(caret)
require(doParallel)
no_cores <- detectCores() - 1
cls = makeCluster(no_cores)
registerDoParallel(cls)
data(economics)
#str(economics)
ec.data <- as.data.frame(economics[,-1]) #drop 'date' column
#head(ec.data)
#trainControl() with parallel processing and 1 step forecasts by TimeSlices------------------------
set.seed(123)
samplesCount = nrow(ec.data)
initialWindow = 10
h = 1
s = 0
M = 1 # no of models that are evaluated during each resample (tuning parameters)
#seeds
resamplesCount = length(createTimeSlices(1:samplesCount, initialWindow, horizon = h, fixedWindow = TRUE, skip = s)$test)
seeds <- vector(mode = "list", length = resamplesCount + 1) # length = B+1, B = number of resamples
for(i in 1:resamplesCount) seeds[[i]] <- sample.int(1000, M) # The first B elements of the list should be vectors of integers of >= length M where M is the number of models being evaluated for each resample.
seeds[[(resamplesCount+1)]] <- sample.int(1000, 1) # The last element of the list only needs to be a single integer (for the final model)
trainCtrl.ec <- trainControl(
method = "timeslice", initialWindow = initialWindow, horizon = h, skip = s, # data splitting
returnResamp = "all",
savePredictions = "all",
seeds = seeds,
allowParallel = TRUE)
lm.fit.ec <- train( unemploy ~ ., data = ec.data,
method = "lm",
trControl = trainCtrl.ec)
lm.fit.ec
head(lm.fit.ec$resample)
> head(lm.fit.ec$resample)
RMSE Rsquared MAE intercept Resample
1 16.33273 NA 16.33273 TRUE Training010
2 232.16184 NA 232.16184 TRUE Training011
3 197.65143 NA 197.65143 TRUE Training012
4 393.29469 NA 393.29469 TRUE Training013
5 129.99157 NA 129.99157 TRUE Training014
6 60.95649 NA 60.95649 TRUE Training015
> head(lm.fit.ec$resample)
RMSE Rsquared MAE intercept Resample
1 16.33273 NA 16.33273 TRUE Training010
2 232.16184 NA 232.16184 TRUE Training011
3 197.65143 NA 197.65143 TRUE Training012
4 393.29469 NA 393.29469 TRUE Training013
5 129.99157 NA 129.99157 TRUE Training014
6 60.95649 NA 60.95649 TRUE Training015
first_holdout <- subset(lm.fit.ec$pred, Resample == "Training010")
first_holdout
> first_holdout
pred obs rowIndex intercept Resample
1 2756.333 2740 11 TRUE Training010 # only 1 row since 1 step forecast horizon
# Calculate RMSE, Rsquared and MAE for the holdout set
postResample(first_holdout$pred, first_holdout$obs)
> postResample(first_holdout$pred, first_holdout$obs)
RMSE Rsquared MAE
16.33273 NA 16.33273
>头部(lm.fit.ec$重采样)
RMSE Rsquared MAE截距重采样
1 16.33273 NA 16.33273真实培训010
2 232.16184 NA 232.16184真实培训011
3 197.65143 NA 197.65143真实培训012
4 393.29469 NA 393.29469真实培训013
5 129.99157 NA 129.99157真实培训014
6 60.95649 NA 60.95649真实培训015
第一个坚持第一个坚持
pred obs行索引截取重采样
1 2756.333 2740 11真实培训010#自1步预测期后仅1行
#计算保持组的RMSE、Rsquared和MAE
重采样后(第一次坚持$pred,第一次坚持$obs)
>重采样后(第一次坚持$pred,第一次坚持$obs)
RMSE Rsquared MAE
16.33273 NA 16.33273
我在这里的困惑主要是因为Rsquared是NA。但由于预测层位为1步,所有保留样本只有一行,因此无法计算RSquare