Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/ruby-on-rails/58.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何重现';的$resample和$result;列车';插入符号中的对象?_R_Regression_R Caret - Fatal编程技术网

如何重现';的$resample和$result;列车';插入符号中的对象?

如何重现';的$resample和$result;列车';插入符号中的对象?,r,regression,r-caret,R,Regression,R Caret,我不熟悉神奇的插入符号包,尝试使用重采样方法='timeslice'从lm模型的train()输出中复制一些对象 为什么我的示例中的$result$RMSE和$result$Rsquared不同 从函数defaultSummary的输出($pred$pred,$pred$obs) 用什么数据计算RMSE、Rsquared、MAE(单位:美元重采样) require(caret) require(doParallel) no_cores <- detectCores() - 1 cls

我不熟悉神奇的插入符号包,尝试使用重采样方法='timeslice'从lm模型的train()输出中复制一些对象

  • 为什么我的示例中的$result$RMSE和$result$Rsquared不同 从函数defaultSummary的输出($pred$pred,$pred$obs)
  • 用什么数据计算RMSE、Rsquared、MAE(单位:美元重采样)

    require(caret)
    require(doParallel)
    
    no_cores <- detectCores() - 1  
    cls = makeCluster(no_cores)
    registerDoParallel(cls)
    
    data(economics)
    #str(economics)
    ec.data <- as.data.frame(economics[,-1]) #drop 'date' column
    #head(ec.data)
    
    #trainControl() with parallel processing and 1 step forecasts by TimeSlices------------------------
    set.seed(123)
    samplesCount = nrow(ec.data)
    initialWindow  = 10
    h = 1
    s = 0
    M = 1 # no of models that are evaluated during each resample (tuning parameters)
    
    #seeds
    resamplesCount = length(createTimeSlices(1:samplesCount, initialWindow, horizon = h, fixedWindow = TRUE, skip = s)$test)
    seeds <- vector(mode = "list", length = resamplesCount + 1)   # length = B+1, B = number of resamples
    for(i in 1:resamplesCount) seeds[[i]] <- sample.int(1000, M)  # The first B elements of the list should be vectors of integers of >= length M where M is the number of models being evaluated for each resample.
    seeds[[(resamplesCount+1)]] <- sample.int(1000, 1) # The last element of the list only needs to be a single integer (for the final model)
    
    
    trainCtrl.ec <- trainControl(
      method = "timeslice", initialWindow = initialWindow, horizon = h, skip = s,    # data splitting
      returnResamp = "all",
      savePredictions = "all",
      seeds = seeds,
      allowParallel = TRUE)
    
    
    lm.fit.ec <- train( unemploy ~ ., data = ec.data,
                      method = "lm",
                      trControl = trainCtrl.ec)
    
    lm.fit.ec
    head(lm.fit.ec$resample)
    
    > head(lm.fit.ec$resample)
           RMSE Rsquared       MAE intercept    Resample
    1  16.33273       NA  16.33273      TRUE Training010
    2 232.16184       NA 232.16184      TRUE Training011
    3 197.65143       NA 197.65143      TRUE Training012
    4 393.29469       NA 393.29469      TRUE Training013
    5 129.99157       NA 129.99157      TRUE Training014
    6  60.95649       NA  60.95649      TRUE Training015
    
    > head(lm.fit.ec$resample)
    RMSE Rsquared       MAE intercept    Resample
    1  16.33273       NA  16.33273      TRUE Training010
    2 232.16184       NA 232.16184      TRUE Training011
    3 197.65143       NA 197.65143      TRUE Training012
    4 393.29469       NA 393.29469      TRUE Training013
    5 129.99157       NA 129.99157      TRUE Training014
    6  60.95649       NA  60.95649      TRUE Training015
    
    
    first_holdout <- subset(lm.fit.ec$pred, Resample == "Training010")
    first_holdout
    
    > first_holdout
    pred        obs rowIndex intercept    Resample
    1 2756.333 2740       11      TRUE Training010  # only 1 row since 1 step forecast horizon
    
    
    # Calculate RMSE, Rsquared and MAE for the holdout set
    postResample(first_holdout$pred, first_holdout$obs)
    
    > postResample(first_holdout$pred, first_holdout$obs)
    RMSE     Rsquared      MAE 
    16.33273       NA     16.33273
    
    为什么RMSE和RSQUARE的输出与使用defaultSummary()计算时不一样

    会话信息:

    > sessionInfo()
    R version 3.4.2 (2017-09-28)
    Platform: x86_64-w64-mingw32/x64 (64-bit)
    Running under: Windows >= 8 x64 (build 9200)
    
    Matrix products: default
    
    locale:
    [1] LC_COLLATE=Swedish_Sweden.1252  LC_CTYPE=Swedish_Sweden.1252    LC_MONETARY=Swedish_Sweden.1252
    [4] LC_NUMERIC=C                    LC_TIME=Swedish_Sweden.1252    
    
    attached base packages:
    [1] parallel  stats     graphics  grDevices utils     datasets  methods   base     
    
    other attached packages:
     [1] fpp_0.5             tseries_0.10-42     lmtest_0.9-35       zoo_1.8-0          
     [5] expsmooth_2.3       fma_2.3             forecast_8.2        mlbench_2.1-1      
     [9] spikeslab_1.1.5     randomForest_4.6-12 lars_1.2            doParallel_1.0.11  
    [13] iterators_1.0.8     foreach_1.4.3       caret_6.0-77.9000   ggplot2_2.2.1      
    [17] lattice_0.20-35 
    

    我在这里找到了问题的答案:

    问题1。为什么我的示例中的$result$RMSE和$result$rsquare与函数defaultSummary($pred$pred,$pred$obs)的输出不同

    A:列车输出计算为拒动者的平均值。在我的例子中:

        # The output is the mean of $resample
        mean(lm.fit.ec$resample$RMSE)  # =250.072
        mean(lm.fit.ec$resample$MAE)   # =250.072
    
    问题2。用什么数据计算RMSE、Rsquared、MAE(单位:美元重采样)

    require(caret)
    require(doParallel)
    
    no_cores <- detectCores() - 1  
    cls = makeCluster(no_cores)
    registerDoParallel(cls)
    
    data(economics)
    #str(economics)
    ec.data <- as.data.frame(economics[,-1]) #drop 'date' column
    #head(ec.data)
    
    #trainControl() with parallel processing and 1 step forecasts by TimeSlices------------------------
    set.seed(123)
    samplesCount = nrow(ec.data)
    initialWindow  = 10
    h = 1
    s = 0
    M = 1 # no of models that are evaluated during each resample (tuning parameters)
    
    #seeds
    resamplesCount = length(createTimeSlices(1:samplesCount, initialWindow, horizon = h, fixedWindow = TRUE, skip = s)$test)
    seeds <- vector(mode = "list", length = resamplesCount + 1)   # length = B+1, B = number of resamples
    for(i in 1:resamplesCount) seeds[[i]] <- sample.int(1000, M)  # The first B elements of the list should be vectors of integers of >= length M where M is the number of models being evaluated for each resample.
    seeds[[(resamplesCount+1)]] <- sample.int(1000, 1) # The last element of the list only needs to be a single integer (for the final model)
    
    
    trainCtrl.ec <- trainControl(
      method = "timeslice", initialWindow = initialWindow, horizon = h, skip = s,    # data splitting
      returnResamp = "all",
      savePredictions = "all",
      seeds = seeds,
      allowParallel = TRUE)
    
    
    lm.fit.ec <- train( unemploy ~ ., data = ec.data,
                      method = "lm",
                      trControl = trainCtrl.ec)
    
    lm.fit.ec
    head(lm.fit.ec$resample)
    
    > head(lm.fit.ec$resample)
           RMSE Rsquared       MAE intercept    Resample
    1  16.33273       NA  16.33273      TRUE Training010
    2 232.16184       NA 232.16184      TRUE Training011
    3 197.65143       NA 197.65143      TRUE Training012
    4 393.29469       NA 393.29469      TRUE Training013
    5 129.99157       NA 129.99157      TRUE Training014
    6  60.95649       NA  60.95649      TRUE Training015
    
    > head(lm.fit.ec$resample)
    RMSE Rsquared       MAE intercept    Resample
    1  16.33273       NA  16.33273      TRUE Training010
    2 232.16184       NA 232.16184      TRUE Training011
    3 197.65143       NA 197.65143      TRUE Training012
    4 393.29469       NA 393.29469      TRUE Training013
    5 129.99157       NA 129.99157      TRUE Training014
    6  60.95649       NA  60.95649      TRUE Training015
    
    
    first_holdout <- subset(lm.fit.ec$pred, Resample == "Training010")
    first_holdout
    
    > first_holdout
    pred        obs rowIndex intercept    Resample
    1 2756.333 2740       11      TRUE Training010  # only 1 row since 1 step forecast horizon
    
    
    # Calculate RMSE, Rsquared and MAE for the holdout set
    postResample(first_holdout$pred, first_holdout$obs)
    
    > postResample(first_holdout$pred, first_holdout$obs)
    RMSE     Rsquared      MAE 
    16.33273       NA     16.33273
    
    >头部(lm.fit.ec$重采样)
    RMSE Rsquared MAE截距重采样
    1 16.33273 NA 16.33273真实培训010
    2 232.16184 NA 232.16184真实培训011
    3 197.65143 NA 197.65143真实培训012
    4 393.29469 NA 393.29469真实培训013
    5 129.99157 NA 129.99157真实培训014
    6 60.95649 NA 60.95649真实培训015
    第一个坚持第一个坚持
    pred obs行索引截取重采样
    1 2756.333 2740 11真实培训010#自1步预测期后仅1行
    #计算保持组的RMSE、Rsquared和MAE
    重采样后(第一次坚持$pred,第一次坚持$obs)
    >重采样后(第一次坚持$pred,第一次坚持$obs)
    RMSE Rsquared MAE
    16.33273 NA 16.33273
    

    我在这里的困惑主要是因为Rsquared是NA。但由于预测层位为1步,所有保留样本只有一行,因此无法计算RSquare。

    我在这里找到了问题的答案:

    问题1。为什么我的示例中的$result$RMSE和$result$rsquare与函数defaultSummary($pred$pred,$pred$obs)的输出不同

    A:列车输出计算为拒动者的平均值。在我的例子中:

        # The output is the mean of $resample
        mean(lm.fit.ec$resample$RMSE)  # =250.072
        mean(lm.fit.ec$resample$MAE)   # =250.072
    
    问题2。用什么数据计算RMSE、Rsquared、MAE(单位:美元重采样)

    require(caret)
    require(doParallel)
    
    no_cores <- detectCores() - 1  
    cls = makeCluster(no_cores)
    registerDoParallel(cls)
    
    data(economics)
    #str(economics)
    ec.data <- as.data.frame(economics[,-1]) #drop 'date' column
    #head(ec.data)
    
    #trainControl() with parallel processing and 1 step forecasts by TimeSlices------------------------
    set.seed(123)
    samplesCount = nrow(ec.data)
    initialWindow  = 10
    h = 1
    s = 0
    M = 1 # no of models that are evaluated during each resample (tuning parameters)
    
    #seeds
    resamplesCount = length(createTimeSlices(1:samplesCount, initialWindow, horizon = h, fixedWindow = TRUE, skip = s)$test)
    seeds <- vector(mode = "list", length = resamplesCount + 1)   # length = B+1, B = number of resamples
    for(i in 1:resamplesCount) seeds[[i]] <- sample.int(1000, M)  # The first B elements of the list should be vectors of integers of >= length M where M is the number of models being evaluated for each resample.
    seeds[[(resamplesCount+1)]] <- sample.int(1000, 1) # The last element of the list only needs to be a single integer (for the final model)
    
    
    trainCtrl.ec <- trainControl(
      method = "timeslice", initialWindow = initialWindow, horizon = h, skip = s,    # data splitting
      returnResamp = "all",
      savePredictions = "all",
      seeds = seeds,
      allowParallel = TRUE)
    
    
    lm.fit.ec <- train( unemploy ~ ., data = ec.data,
                      method = "lm",
                      trControl = trainCtrl.ec)
    
    lm.fit.ec
    head(lm.fit.ec$resample)
    
    > head(lm.fit.ec$resample)
           RMSE Rsquared       MAE intercept    Resample
    1  16.33273       NA  16.33273      TRUE Training010
    2 232.16184       NA 232.16184      TRUE Training011
    3 197.65143       NA 197.65143      TRUE Training012
    4 393.29469       NA 393.29469      TRUE Training013
    5 129.99157       NA 129.99157      TRUE Training014
    6  60.95649       NA  60.95649      TRUE Training015
    
    > head(lm.fit.ec$resample)
    RMSE Rsquared       MAE intercept    Resample
    1  16.33273       NA  16.33273      TRUE Training010
    2 232.16184       NA 232.16184      TRUE Training011
    3 197.65143       NA 197.65143      TRUE Training012
    4 393.29469       NA 393.29469      TRUE Training013
    5 129.99157       NA 129.99157      TRUE Training014
    6  60.95649       NA  60.95649      TRUE Training015
    
    
    first_holdout <- subset(lm.fit.ec$pred, Resample == "Training010")
    first_holdout
    
    > first_holdout
    pred        obs rowIndex intercept    Resample
    1 2756.333 2740       11      TRUE Training010  # only 1 row since 1 step forecast horizon
    
    
    # Calculate RMSE, Rsquared and MAE for the holdout set
    postResample(first_holdout$pred, first_holdout$obs)
    
    > postResample(first_holdout$pred, first_holdout$obs)
    RMSE     Rsquared      MAE 
    16.33273       NA     16.33273
    
    >头部(lm.fit.ec$重采样)
    RMSE Rsquared MAE截距重采样
    1 16.33273 NA 16.33273真实培训010
    2 232.16184 NA 232.16184真实培训011
    3 197.65143 NA 197.65143真实培训012
    4 393.29469 NA 393.29469真实培训013
    5 129.99157 NA 129.99157真实培训014
    6 60.95649 NA 60.95649真实培训015
    第一个坚持第一个坚持
    pred obs行索引截取重采样
    1 2756.333 2740 11真实培训010#自1步预测期后仅1行
    #计算保持组的RMSE、Rsquared和MAE
    重采样后(第一次坚持$pred,第一次坚持$obs)
    >重采样后(第一次坚持$pred,第一次坚持$obs)
    RMSE Rsquared MAE
    16.33273 NA 16.33273
    
    我在这里的困惑主要是因为Rsquared是NA。但由于预测层位为1步,所有保留样本只有一行,因此无法计算RSquare