利用带外生回归的ARIMA方法检测R

利用带外生回归的ARIMA方法检测R,r,time-series,outliers,arima,R,Time Series,Outliers,Arima,我想检测每小时聚合的实时数据中的异常值。对于本例,我选择了来自澳大利亚墨尔本的每小时行人数据 (,) ,到时候我会学习和使用。 在短期内,我想使用最简单的方法。@Aksakal在以下stackexchange帖子中概述了一种方法: 我认为关键是图表中的“意外”限定符。为了 检测出你需要的意外情况,从而对预期有一个概念 我将从一个简单的时间序列模型开始,比如AR(p)或 ARMA(p,q)。根据数据进行拟合,适当添加季节性。对于 例如,您的SAR(1)(24)模型可以是:$y{t}=c+\phi

我想检测每小时聚合的实时数据中的异常值。对于本例,我选择了来自澳大利亚墨尔本的每小时行人数据 (,)

,到时候我会学习和使用。 在短期内,我想使用最简单的方法。@Aksakal在以下stackexchange帖子中概述了一种方法:

我认为关键是图表中的“意外”限定符。为了 检测出你需要的意外情况,从而对预期有一个概念

我将从一个简单的时间序列模型开始,比如AR(p)或 ARMA(p,q)。根据数据进行拟合,适当添加季节性。对于 例如,您的SAR(1)(24)模型可以是:$y{t}=c+\phi y{t-1}+\Phi{24}y{t-24}+\Phi{25}y{t-25}+\varepsilon\t$,其中$t$ 时间是以小时为单位的。所以,你可以预测下一个小时的图表。 每当预测错误$e\u t=y\u t-\t y\u t$太大时,您就可以 发出警报

当你估计模型时,你会得到方差 错误$\varepsilon\u t$的$\sigma\u\varepsilon$。取决于你的 分布假设,如正常,可以设置阈值 基于概率,例如$| e|t | 3\sigma\varepsilon$

来访者的数量可能相当持久,但超级多 季节性的。用季节性的假人来代替传统的假人可能效果更好 乘性季节性,然后你可以尝试ARMAX,其中X代表 外生变量,可以是假日假,小时 假人、周末假人等

不幸的是,这篇文章没有详细介绍,因此我有几个问题:

问题1)如何从auto.ARIMA(数据,xreg=xreg)生成的拟合模型中计算/提取ARIMA误差项$\epsilon$的方差$\sigma\uvarepsilon$? 下面是一个完整的R示例,它使用多个季节性来捕获每日、每周和每年的季节性。这不是优化的,仅作为示例实现提供,以帮助回答问题2

我希望预测全年(或至少30天)的阈值,这意味着h=24小时*30=720。 本质上,我想预测的不是每小时行人计数的平均值,而是h>>1(例如,h=720小时(30天)甚至h=24*365=8760小时(1年))每小时的预期行人数量上限(例如,3σε)

问题2)我如何使用上述方法实现这一点

帮助解决上述问题的示例代码

library(rwalkr)
library(forecast)
library(tidyverse)
library(tsibble)
library(xts)
library(dygraphs)

pedestrian <- as_tibble(rwalkr::run_melb( year = c(2015:2018) )) 

pedestrian_statelibrary <- pedestrian %>% 
  filter(Sensor == "State Library") %>% 
  left_join(tsibble::holiday_aus(2015:2018, state='VIC'), by=c( 'Date' = 'date' )) %>%
  mutate(holiday = replace_na(holiday, ''),
         Count = ifelse(Count == 0, NA, Count))

# Replace all counts of zero with NA so Box-Cox transform lambda = 0 and constrain output to +ve.
pedestrian_statelibrary_train <- pedestrian_statelibrary %>% filter(Date >= as.Date('2015-05-13'), Date < as.Date('2017-01-01') )
pedestrian_statelibrary_test <- pedestrian_statelibrary %>% filter(Date >= as.Date('2017-01-01') )

# tsbox functions to convert tsibble to tz indirectly. Must be a better way of doing this...
pedestrian_statelibrary_train_zoo <- tsbox::ts_zoo( pedestrian_statelibrary_train %>% select(Date_Time, Count) )
pedestrian_statelibrary_train_ts <-    tsbox::ts_ts(pedestrian_statelibrary_train_zoo)

pedestrian_statelibrary_test_zoo <- tsbox::ts_zoo(    pedestrian_statelibrary_test %>% select(Date_Time, Count) )
pedestrian_statelibrary_test_ts <- tsbox::ts_ts(pedestrian_statelibrary_test_zoo)


## Create external regressors.
xreg_holidays_train <- model.matrix(~as.factor(pedestrian_statelibrary_train$holiday))
xreg_holidays_train <- xreg_holidays_train[,-1]  # remove intercept.
# Remove 1st level from levels()
colnames(xreg_holidays_train) <- levels(as.factor(pedestrian_statelibrary_train$holiday))[-1]

xreg_holidays_test <- model.matrix(~as.factor(pedestrian_statelibrary_test$holiday))
xreg_holidays_test <- xreg_holidays_test[,-1]  # remove intercept.
colnames(xreg_holidays_test) <- levels(as.factor(pedestrian_statelibrary_test$holiday))[-1]

# periods (intervals(samples) per period) for hourly data.
period_day <- 24
period_week <- 24*7
period_year <- 24*365.25

seasonal_periods = c(period_day, period_week, period_year)

pedestrian_statelibrary_train_msts <- msts(pedestrian_statelibrary_train_ts,
                                     start = start(pedestrian_statelibrary_train_ts), 
                                             seasonal.periods = seasonal_periods) 

pedestrian_statelibrary_test_msts <- msts(pedestrian_statelibrary_test_ts, 
                                      start = start(pedestrian_statelibrary_test_ts), 
                                       seasonal.periods = seasonal_periods) 

# set number of Fourier terms per season. Not optimal.
Ks = c(12, 10, 2)

xreg_train <- cbind( seasonality = fourier(pedestrian_statelibrary_train_msts, K = Ks), 
                 holidays = xreg_holidays_train ) 

######################################
## Fit model of exogenous factors and ARIMA as error
######################################
fit <- pedestrian_statelibrary_train_msts %>% 
  auto.arima( xreg = xreg_train,
              seasonal=FALSE,
              stepwise = FALSE,
              parallel = TRUE,
              num.cores = NULL,
              lambda = 0
              ) 

######################################
## Forecast
######################################

fc <- forecast( fit, 
            xreg=cbind( seasonality = fourier(pedestrian_statelibrary_test_msts, K = Ks), 
                        holidays = xreg_holidays_test) 
) 

######################################
## Check residuals and accuracy.
######################################

checkresiduals(fit)

checkresiduals(fc)

accuracy(fc, pedestrian_statelibrary_test_msts)


######################################
## Display fitted model and forecast using interactive dygraph.
######################################

# Plotting `forecast` prediction using `dygraphs`
# https://stackoverflow.com/questions/43624634/plotting-forecast-prediction-using-dygraphs#43668603
as.forecast.ts <- function(forecast_obj){

  training <- forecast_obj$x
  lower <- forecast_obj$lower[,2]
  upper <- forecast_obj$upper[,2]
  point_forecast <- forecast_obj$mean

  cbind(training, lower, upper, point_forecast)
}

fc_ts <- as.forecast.ts(fc)

# Add the time stamps back to ts object.
idx_train <- pedestrian_statelibrary_train %>% ungroup() %>%    select(Date_Time) %>% as.data.frame()
idx_test <- pedestrian_statelibrary_test %>% ungroup() %>% select(Date_Time) %>% as.data.frame()
idx_all <- rbind(idx_train, idx_test)

# Append testing values to fc_ts object, by left joining two xts objects.
test_xts <- as.xts(x = pedestrian_statelibrary_test %>% 
                     dplyr::ungroup() %>%
                     as.data.frame() %>% 
                     dplyr::select( Count ) %>%
                     dplyr::rename( 'testing' = 'Count'), 
                   pedestrian_statelibrary_test$Date_Time)


fc_xts <- as.xts(x = fc_ts %>% 
                   as.data.frame(),
                 idx_all$Date_Time )

fc_xts <- fc_xts %>% xts::merge.xts(test_xts, join='left')

dygraph(data = fc_xts, main = "Pedestrian traffic Forecasting for State Library.") %>% 
  dyRangeSelector %>%
  dySeries(name = "training", label = "Train") %>%
  dySeries(name = 'testing', label = "Test") %>%
  dySeries(name = "point_forecast", label = "Predicted") %>%
  dyLegend(show = "always", hideOnMouseOut = FALSE) %>%
  dyOptions(axisLineColor = "navy", gridLineColor = "grey")
库(rwalkr)
图书馆(预测)
图书馆(tidyverse)
图书馆(TSIBLE)
图书馆(xts)
图书馆(动态图)
行人百分比
左加入(tsibble::holiday(2015:2018,state='VIC'),by=c('Date'='Date'))%>%
变异(假日=替换(假日,),
计数=ifelse(计数=0,不适用,计数))
#将所有零计数替换为NA so Box Cox transform lambda=0,并将输出约束为+ve。
行人状态库列车%过滤器(日期>=截止日期('2015-05-13'),日期<截止日期('2017-01-01'))
行人状态库测试%filter(日期>=截止日期('2017-01-01'))
#tsbox函数将TSIBLE间接转换为tz。这肯定是一个更好的方法。。。
行人\州图书馆\火车\动物园%选择(日期\时间,计数))

行人(statebrary)列车(train)(Hi@SamuelLiew),我根据要求对问题进行了编辑,使其更加集中。我相信这是更清晰和集中的。你可以重新回答这个问题吗?为@SamuelLiew干杯。这个问题太宽泛,代码太多。尚不清楚具体问题是什么(任何错误?、意外结果?),需要具体的解决方案。Fair call@SamuelLiew。我希望在stats.SE上获得关于这个问题的帮助,并根据需要更新这个问题。