将嵌套和滚动从R中的Tidymodels中分离出来
我正在尝试使用Tidymodels套件中的rolling_origin来训练一个随机森林。我希望褶皱正好是一年中的月份。嵌套看起来可以做到这一点,但当数据嵌套时,tune_grid无法找到变量。我怎样才能做到这一点?我在下面举了一个可复制的例子将嵌套和滚动从R中的Tidymodels中分离出来,r,nested,cross-validation,rolling-computation,tidymodels,R,Nested,Cross Validation,Rolling Computation,Tidymodels,我正在尝试使用Tidymodels套件中的rolling_origin来训练一个随机森林。我希望褶皱正好是一年中的月份。嵌套看起来可以做到这一点,但当数据嵌套时,tune_grid无法找到变量。我怎样才能做到这一点?我在下面举了一个可复制的例子 suppressPackageStartupMessages(library(tidyverse)) suppressPackageStartupMessages(library(tidymodels)) suppressPackageStartupM
suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(tidymodels))
suppressPackageStartupMessages(library(yardstick))
# Create dummy data ====================================================================================================
dates <- seq(from = as.Date("2019-01-01"), to = as.Date("2019-12-31"), by = 'day' )
l <- length(dates)
set.seed(1)
data_set <- data.frame(
date = dates,
v1 = rnorm(l),
v2 = rnorm(l),
v3 = rnorm(l),
y = rnorm(l)
)
# Random Forest Model =================================================================================================
model <-
parsnip::rand_forest(
mode = "regression",
trees = tune()) %>%
set_engine("ranger")
# grid specification
params <-
dials::parameters(
trees()
)
# Set up grid and model workflow =======================================================================================
grid <-
dials::grid_max_entropy(
params,
size = 2
)
form <- as.formula(paste("y ~ v1 + v2 + v3"))
model_workflow <-
workflows::workflow() %>%
add_model(model) %>%
add_formula(form)
# Tuning on the normal data set works ====================================================================================================
data_ro_day <- data_set %>%
rolling_origin(
initial = 304,
assess = 30,
cumulative = TRUE,
skip = 30
)
results <- tune_grid(
model_workflow,
grid = grid,
resamples = data_ro_day,
param_info = params,
metrics = metric_set(mae, mape, rmse, rsq),
control = control_grid(verbose = TRUE))
results %>% show_best("mape", n = 2)
# Tuning on the nested data set doesn't work =========================================================================================
data_ro_month <- data_set %>%
mutate(year_month = format(date, "%Y-%m")) %>%
nest(-year_month) %>%
rolling_origin(
initial = 10,
assess = 1,
cumulative = TRUE
)
results <- tune_grid(
model_workflow,
grid = grid,
resamples = data_ro_month,
param_info = params,
metrics = metric_set(mae, mape, rmse, rsq),
control = control_grid(verbose = TRUE))
results$.notes ```
SuppressPackageStatupMessages(库(tidyverse))
SuppressPackageStatupMessages(库(tidymodels))
SuppressPackageStatupMessages(库(尺度))
#创建虚拟数据====================================================================================================
日期我不完全清楚如何分割数据进行调优,但我建议您研究其他一些rsample函数,如滑动窗口()
,尤其是滑动时段()
。它们允许您创建用于调整的实验设计,您可以在其中适应特定月份的数据,然后在另一个月份进行评估,并在您可用的所有月份进行滑动:
库(tidymodels)
日期分割id
#>
#>1切片1
#>2切片2
#>3片3
#>4.4.4
#>5片5
#>6.6
#>7片7
我在这里使用了skip=4
,只保留了可以获得更多训练数据的切片。每个切片都将在几个月的数据基础上进行培训,并在上个月的新数据基础上进行评估。重采样将在数据集中向前滑动。因为我使用了lookback=Inf
它总是包含所有过去的数据,但是您可以更改它
当您设置了适合您的域问题的重采样方法后,您可以制定模型规范并对其进行调整:
rf\u规格%
设置引擎(“游骑兵”)
rf_wf%
添加_型号(rf_规格)%%>%
添加_公式(y~v1+v2+v3)
调整网格(rf\U wf,重采样=月折叠)
#>#调整结果
#>#滑动周期重采样
#>#tibble:7 x 4
#>拆分id.metrics.notes
#>
#>1切片1
#>2切片2
#>3片3
#>4.4.4
#>5片5
#>6.6
#>7片7
由(v0.3.0.9001)于2020-11-15创建谢谢Julia,我不知道滑动周期,这很有效!