R 使用Fable聚合预测_R_Dplyr_Forecasting_Fable_Tidyverts

R 使用Fable聚合预测

R 使用Fable聚合预测,r,dplyr,forecasting,fable,tidyverts,R,Dplyr,Forecasting,Fable,Tidyverts,问题：使用fable，我可以轻松地生成具有分组结构的时间序列预测，甚至可以使用fable的聚合_键/协调语法生成连贯的顶层预测。但是，我无法使用此方法轻松访问聚合预测，我使用的替代方法包括放弃fable（预测表）结构。有人能告诉我，是否有一种更简单的/预期的方法可以使用该软件包实现这一点吗？正如您在示例中所看到的，我可以使用其他方法实现，但我想知道是否有更好的方法。感谢您的帮助方法1: 我在不使用aggregate\u键/concurrence的情况下总结预测的工作主要是使用dplyr的gr

问题： 使用fable，我可以轻松地生成具有分组结构的时间序列预测，甚至可以使用fable的

聚合_键

协调

语法生成连贯的顶层预测。但是，我无法使用此方法轻松访问聚合预测，我使用的替代方法包括放弃fable（预测表）结构。有人能告诉我，是否有一种更简单的/预期的方法可以使用该软件包实现这一点吗？正如您在示例中所看到的，我可以使用其他方法实现，但我想知道是否有更好的方法。感谢您的帮助

方法1: 我在不使用

aggregate\u键

concurrence

的情况下总结预测的工作主要是使用dplyr的

group\u by

和

summary

，但是预测的预测间隔格式为正态分布对象，这似乎不支持使用此方法进行总结。为了解决这个问题，我一直在使用

hilo

和

unpack\u hilo

来提取不同预测间隔的边界，然后可以使用通常的方法求和。然而，我真的希望保留寓言结构和分布对象，这是不可能使用这种方法的

方法2: 另一种方法是使用

aggregate\u键

协调

似乎只支持使用

minu trace

进行聚合。我知道这种方法是为了最佳调节，而我想要的是一个简单的自下而上的总体预测。感觉应该有一种简单的方法可以使用这种语法获得自底向上的预测，但到目前为止我还没有找到。此外，即使使用

minu trace

我也不确定如何访问聚合预测本身，正如您在示例中看到的那样

使用方法1的示例：

library(fable)
#> Loading required package: fabletools
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

lung_deaths_agg <- as_tsibble(cbind(mdeaths, fdeaths))
  
fc_1 <- lung_deaths_agg %>% 
  model(lm = TSLM(value ~ trend() + season())) %>% 
  forecast()

fc_1
#> # A fable: 48 x 5 [1M]
#> # Key:     key, .model [2]
#>    key     .model    index        value .mean
#>    <chr>   <chr>     <mth>       <dist> <dbl>
#>  1 fdeaths lm     1980 Jan N(794, 5940)  794.
#>  2 fdeaths lm     1980 Feb N(778, 5940)  778.
#>  3 fdeaths lm     1980 Mar N(737, 5940)  737.
#>  4 fdeaths lm     1980 Apr N(577, 5940)  577.
#>  5 fdeaths lm     1980 May N(456, 5940)  456.
#>  6 fdeaths lm     1980 Jun N(386, 5940)  386.
#>  7 fdeaths lm     1980 Jul N(379, 5940)  379.
#>  8 fdeaths lm     1980 Aug N(335, 5940)  335.
#>  9 fdeaths lm     1980 Sep N(340, 5940)  340.
#> 10 fdeaths lm     1980 Oct N(413, 5940)  413.
#> # ... with 38 more rows

fc_1 %>%
  hilo() %>% 
  unpack_hilo(c(`80%`, `95%`)) %>% 
  as_tibble() %>% 
  group_by(index) %>% 
  summarise(across(c(.mean, ends_with("upper"), ends_with("lower")), sum))
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 24 x 6
#>       index .mean `80%_upper` `95%_upper` `80%_lower` `95%_lower`
#>       <mth> <dbl>       <dbl>       <dbl>       <dbl>       <dbl>
#>  1 1980 Jan 2751.       3089.       3267.       2414.       2236.
#>  2 1980 Feb 2687.       3024.       3202.       2350.       2171.
#>  3 1980 Mar 2535.       2872.       3051.       2198.       2020.
#>  4 1980 Apr 2062.       2399.       2577.       1725.       1546.
#>  5 1980 May 1597.       1934.       2113.       1260.       1082.
#>  6 1980 Jun 1401.       1738.       1916.       1064.        885.
#>  7 1980 Jul 1343.       1680.       1858.       1006.        827.
#>  8 1980 Aug 1200.       1538.       1716.        863.        685.
#>  9 1980 Sep 1189.       1527.       1705.        852.        674.
#> 10 1980 Oct 1482.       1819.       1998.       1145.        967.
#> # ... with 14 more rows

fc_2 <- lung_deaths_agg %>%
  aggregate_key(key, value = sum(value)) %>% 
  model(lm = TSLM(value ~ trend() + season())) %>%
  reconcile(lm = min_trace(lm)) %>% 
  forecast()

fc_2
#> # A fable: 72 x 5 [1M]
#> # Key:     key, .model [3]
#>    key     .model    index        value .mean
#>    <chr>   <chr>     <mth>       <dist> <dbl>
#>  1 fdeaths lm     1980 Jan N(794, 5606)  794.
#>  2 fdeaths lm     1980 Feb N(778, 5606)  778.
#>  3 fdeaths lm     1980 Mar N(737, 5606)  737.
#>  4 fdeaths lm     1980 Apr N(577, 5606)  577.
#>  5 fdeaths lm     1980 May N(456, 5606)  456.
#>  6 fdeaths lm     1980 Jun N(386, 5606)  386.
#>  7 fdeaths lm     1980 Jul N(379, 5606)  379.
#>  8 fdeaths lm     1980 Aug N(335, 5606)  335.
#>  9 fdeaths lm     1980 Sep N(340, 5606)  340.
#> 10 fdeaths lm     1980 Oct N(413, 5606)  413.
#> # ... with 62 more rows

fc_2 %>% as_tibble() %>% select(key) %>% slice(50:55)
#> # A tibble: 6 x 1
#>   key         
#>   <chr>       
#> 1 <aggregated>
#> 2 <aggregated>
#> 3 <aggregated>
#> 4 <aggregated>
#> 5 <aggregated>
#> 6 <aggregated>

fc_2 %>% as_tibble() %>% select(key) %>% filter(key == "<aggregated>")
#> # A tibble: 0 x 1
#> # ... with 1 variable: key <chr>

库（寓言）
#>加载所需包：fabletools
图书馆（dplyr）
#> 
#>正在附加包：“dplyr”
#>以下对象已从“package:stats”屏蔽：
#> 
#>滤波器，滞后
#>以下对象已从“package:base”屏蔽：
#> 
#>相交、setdiff、setequal、并集
肺死亡人数合计%
预测（）
fc_1
#>#寓言：48 x 5[1M]
#>#Key:Key、.model[2]
#>关键。模型指数值。平均值
#>                    
#>1 fdeaths lm 1980年1月N日（7945940）794。
#>两名死者1980年2月N日（7785940）778。
#>3 fdeaths lm 1980年3月N日（7375940）737。
#>4 fdeaths lm 1980年4月N日（5775940）577。
#>5 Fdeath lm 1980年5月N日（4565940）456。
#>6 fdeaths lm 1980年6月N日（3865940）386。
#>7 F死亡lm 1980年7月N日（3795940）379。
#>8 fdeaths lm 1980年8月N日（3355940）335。
#>9月19日1980年9月N日（3405940）340。
#>10 fdeaths lm 1980年10月N日（4135940）413。
#> # ... 还有38行
fc_1%>%
hilo（）%>%
打开包装（c（`80%`，`95%`））%>%
as_tible（）%>%
分组依据（指数）%>%
总结（跨越（c（.mean，以（“上限”）结尾，以（“下限”）结尾，总和））
#>`summary（）`解组输出（用`.groups`参数重写）
#>#tibble:24 x 6
#>指数。平均值'80%'上''95%'上''80%'下''95%'下`
#>                                    
#>1980年1月1日至2751年1月1日。3089326724142236
#>1980年2月2687日。3024320223502171
#>3 1980年3月2535日。2872305121982020
#>1980年4月4日2062年4月。2399257717251546
#>5 1980年5月1597日。1934211312601082
#>1980年6月6日1401年6月。173819161064885
#>1980年7月1343日。168018581006827
#>1980年8月8日至1200日。15381716863685
#>1980年9月9日至1189年9月9日。15271705852674
#>1980年10月10日1482年10月。181919981145967
#> # ... 还有14行

使用方法2的示例：

library(fable)
#> Loading required package: fabletools
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

lung_deaths_agg <- as_tsibble(cbind(mdeaths, fdeaths))
  
fc_1 <- lung_deaths_agg %>% 
  model(lm = TSLM(value ~ trend() + season())) %>% 
  forecast()

fc_1
#> # A fable: 48 x 5 [1M]
#> # Key:     key, .model [2]
#>    key     .model    index        value .mean
#>    <chr>   <chr>     <mth>       <dist> <dbl>
#>  1 fdeaths lm     1980 Jan N(794, 5940)  794.
#>  2 fdeaths lm     1980 Feb N(778, 5940)  778.
#>  3 fdeaths lm     1980 Mar N(737, 5940)  737.
#>  4 fdeaths lm     1980 Apr N(577, 5940)  577.
#>  5 fdeaths lm     1980 May N(456, 5940)  456.
#>  6 fdeaths lm     1980 Jun N(386, 5940)  386.
#>  7 fdeaths lm     1980 Jul N(379, 5940)  379.
#>  8 fdeaths lm     1980 Aug N(335, 5940)  335.
#>  9 fdeaths lm     1980 Sep N(340, 5940)  340.
#> 10 fdeaths lm     1980 Oct N(413, 5940)  413.
#> # ... with 38 more rows

fc_1 %>%
  hilo() %>% 
  unpack_hilo(c(`80%`, `95%`)) %>% 
  as_tibble() %>% 
  group_by(index) %>% 
  summarise(across(c(.mean, ends_with("upper"), ends_with("lower")), sum))
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 24 x 6
#>       index .mean `80%_upper` `95%_upper` `80%_lower` `95%_lower`
#>       <mth> <dbl>       <dbl>       <dbl>       <dbl>       <dbl>
#>  1 1980 Jan 2751.       3089.       3267.       2414.       2236.
#>  2 1980 Feb 2687.       3024.       3202.       2350.       2171.
#>  3 1980 Mar 2535.       2872.       3051.       2198.       2020.
#>  4 1980 Apr 2062.       2399.       2577.       1725.       1546.
#>  5 1980 May 1597.       1934.       2113.       1260.       1082.
#>  6 1980 Jun 1401.       1738.       1916.       1064.        885.
#>  7 1980 Jul 1343.       1680.       1858.       1006.        827.
#>  8 1980 Aug 1200.       1538.       1716.        863.        685.
#>  9 1980 Sep 1189.       1527.       1705.        852.        674.
#> 10 1980 Oct 1482.       1819.       1998.       1145.        967.
#> # ... with 14 more rows

fc_2 <- lung_deaths_agg %>%
  aggregate_key(key, value = sum(value)) %>% 
  model(lm = TSLM(value ~ trend() + season())) %>%
  reconcile(lm = min_trace(lm)) %>% 
  forecast()

fc_2
#> # A fable: 72 x 5 [1M]
#> # Key:     key, .model [3]
#>    key     .model    index        value .mean
#>    <chr>   <chr>     <mth>       <dist> <dbl>
#>  1 fdeaths lm     1980 Jan N(794, 5606)  794.
#>  2 fdeaths lm     1980 Feb N(778, 5606)  778.
#>  3 fdeaths lm     1980 Mar N(737, 5606)  737.
#>  4 fdeaths lm     1980 Apr N(577, 5606)  577.
#>  5 fdeaths lm     1980 May N(456, 5606)  456.
#>  6 fdeaths lm     1980 Jun N(386, 5606)  386.
#>  7 fdeaths lm     1980 Jul N(379, 5606)  379.
#>  8 fdeaths lm     1980 Aug N(335, 5606)  335.
#>  9 fdeaths lm     1980 Sep N(340, 5606)  340.
#> 10 fdeaths lm     1980 Oct N(413, 5606)  413.
#> # ... with 62 more rows

fc_2 %>% as_tibble() %>% select(key) %>% slice(50:55)
#> # A tibble: 6 x 1
#>   key         
#>   <chr>       
#> 1 <aggregated>
#> 2 <aggregated>
#> 3 <aggregated>
#> 4 <aggregated>
#> 5 <aggregated>
#> 6 <aggregated>

fc_2 %>% as_tibble() %>% select(key) %>% filter(key == "<aggregated>")
#> # A tibble: 0 x 1
#> # ... with 1 variable: key <chr>

fc_2%
聚合键（键，值=和（值））%>%
模型（lm=TSLM（值~趋势（）+季节（））%>%
协调（lm=最小跟踪（lm））%>%
预测（）
fc_2
#>#寓言：72 x 5[1M]
#>#Key:Key、.model[3]
#>关键。模型指数值。平均值
#>                    
#>1 fdeaths lm 1980年1月N日（7945606）794。
#>两名死者1980年2月N日（7785606）778。
#>3 fdeaths lm 1980年3月N日（7375606）737。
#>4 fdeaths lm 1980年4月N日（5775606）577。
#>5 Fdeath lm 1980年5月N（4565606）456。
#>6 fdeaths lm 1980年6月N日（3865606）386。
#>7佛得角1980年7月N日（3795606）379。
#>8 fdeaths lm 1980年8月N日（3355606）335。
#>9月19日1980年9月N日（3405606）340。
#>10 fdeaths lm 1980年10月N日（4135606）413。
#> # ... 还有62行
fc_2%%>%作为可存储（）%%>%选择（键）%%>%切片（50:55）
#>#tibble:6 x 1
#>钥匙
#>          
#> 1 
#> 2 
#> 3 
#> 4 
#> 5 
#> 6 
fc_2%%>%as_tible（）%%>%select（key）%%>%filter（key==“”）
#>#tible:0 x 1
#> # ... 带1个变量：key

方法1:

library(fable)
#> Loading required package: fabletools
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

lung_deaths_agg <- as_tsibble(cbind(mdeaths, fdeaths))
  
fc_1 <- lung_deaths_agg %>% 
  model(lm = TSLM(value ~ trend() + season())) %>% 
  forecast()

fc_1
#> # A fable: 48 x 5 [1M]
#> # Key:     key, .model [2]
#>    key     .model    index        value .mean
#>    <chr>   <chr>     <mth>       <dist> <dbl>
#>  1 fdeaths lm     1980 Jan N(794, 5940)  794.
#>  2 fdeaths lm     1980 Feb N(778, 5940)  778.
#>  3 fdeaths lm     1980 Mar N(737, 5940)  737.
#>  4 fdeaths lm     1980 Apr N(577, 5940)  577.
#>  5 fdeaths lm     1980 May N(456, 5940)  456.
#>  6 fdeaths lm     1980 Jun N(386, 5940)  386.
#>  7 fdeaths lm     1980 Jul N(379, 5940)  379.
#>  8 fdeaths lm     1980 Aug N(335, 5940)  335.
#>  9 fdeaths lm     1980 Sep N(340, 5940)  340.
#> 10 fdeaths lm     1980 Oct N(413, 5940)  413.
#> # ... with 38 more rows

fc_1 %>%
  hilo() %>% 
  unpack_hilo(c(`80%`, `95%`)) %>% 
  as_tibble() %>% 
  group_by(index) %>% 
  summarise(across(c(.mean, ends_with("upper"), ends_with("lower")), sum))
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 24 x 6
#>       index .mean `80%_upper` `95%_upper` `80%_lower` `95%_lower`
#>       <mth> <dbl>       <dbl>       <dbl>       <dbl>       <dbl>
#>  1 1980 Jan 2751.       3089.       3267.       2414.       2236.
#>  2 1980 Feb 2687.       3024.       3202.       2350.       2171.
#>  3 1980 Mar 2535.       2872.       3051.       2198.       2020.
#>  4 1980 Apr 2062.       2399.       2577.       1725.       1546.
#>  5 1980 May 1597.       1934.       2113.       1260.       1082.
#>  6 1980 Jun 1401.       1738.       1916.       1064.        885.
#>  7 1980 Jul 1343.       1680.       1858.       1006.        827.
#>  8 1980 Aug 1200.       1538.       1716.        863.        685.
#>  9 1980 Sep 1189.       1527.       1705.        852.        674.
#> 10 1980 Oct 1482.       1819.       1998.       1145.        967.
#> # ... with 14 more rows

fc_2 <- lung_deaths_agg %>%
  aggregate_key(key, value = sum(value)) %>% 
  model(lm = TSLM(value ~ trend() + season())) %>%
  reconcile(lm = min_trace(lm)) %>% 
  forecast()

fc_2
#> # A fable: 72 x 5 [1M]
#> # Key:     key, .model [3]
#>    key     .model    index        value .mean
#>    <chr>   <chr>     <mth>       <dist> <dbl>
#>  1 fdeaths lm     1980 Jan N(794, 5606)  794.
#>  2 fdeaths lm     1980 Feb N(778, 5606)  778.
#>  3 fdeaths lm     1980 Mar N(737, 5606)  737.
#>  4 fdeaths lm     1980 Apr N(577, 5606)  577.
#>  5 fdeaths lm     1980 May N(456, 5606)  456.
#>  6 fdeaths lm     1980 Jun N(386, 5606)  386.
#>  7 fdeaths lm     1980 Jul N(379, 5606)  379.
#>  8 fdeaths lm     1980 Aug N(335, 5606)  335.
#>  9 fdeaths lm     1980 Sep N(340, 5606)  340.
#> 10 fdeaths lm     1980 Oct N(413, 5606)  413.
#> # ... with 62 more rows

fc_2 %>% as_tibble() %>% select(key) %>% slice(50:55)
#> # A tibble: 6 x 1
#>   key         
#>   <chr>       
#> 1 <aggregated>
#> 2 <aggregated>
#> 3 <aggregated>
#> 4 <aggregated>
#> 5 <aggregated>
#> 6 <aggregated>

fc_2 %>% as_tibble() %>% select(key) %>% filter(key == "<aggregated>")
#> # A tibble: 0 x 1
#> # ... with 1 variable: key <chr>

在将内容添加到一起时，使用分发需要更多的注意（而不是数字）。更具体地说，可以添加正态分布的平均值而不会产生问题：

库（分布式）
平均值（距离正态（2,3）+距离正态（4,1））
#> [1] 6
平均值（距离正常（2,3））+平均值（距离正常（4,1））
#> [1] 6

由（v0.3.0）于2020-07-03创建

但是分位数（用于产生80%和95%的间隔）不能：

库（分布式）
分位数（距离正常（2,3）+距离正常（4,1），0.9）
#> [1] 10.05262
分位数（距离正常（2,3），0.9）+分位数（距离正常（4,1），0.9）
#> [1] 11.12621

由（v0.3.0）于2020-07-03创建

如果要聚合分布，则需要计算分布本身的总和：

库（寓言）
L