Python 时间序列数据的平稳性
我正在尝试使用python中的ARIMA建模对时间序列数据进行建模。我在默认数据系列上使用函数Python 时间序列数据的平稳性,python,r,time-series,statsmodels,Python,R,Time Series,Statsmodels,我正在尝试使用python中的ARIMA建模对时间序列数据进行建模。我在默认数据系列上使用函数statsmodels.tsa.stattools.arma\u order\u select\u ic,得到p和q的值分别为2,2。代码如下: dates=pd.date_range('2010-11-1','2011-01-30') dataseries=Series([22,624,634,774,726,752,38,534,722,678,750,690,686,26,708,606,632,
statsmodels.tsa.stattools.arma\u order\u select\u ic
,得到p和q的值分别为2,2。代码如下:
dates=pd.date_range('2010-11-1','2011-01-30')
dataseries=Series([22,624,634,774,726,752,38,534,722,678,750,690,686,26,708,606,632,632,632,584,28,576,474,536,512,464,436,24,448,408,528,
602,638,640,26,658,548,620,534,422,482,26,616,612,622,598,614,614,24,644,506,522,622,526,26,22,738,582,592,408,466,568,
44,680,652,598,642,714,562,38,778,796,742,460,610,42,38,732,650,670,618,574,42,22,610,456,22,630,408,390,24],index=dates)
df=pd.DataFrame({'Consumption':dataseries})
df
sm.tsa.arma_order_select_ic(df, max_ar=4, max_ma=2, ic='aic')
结果如下:
{'aic': 0 1 2
0 1262.244974 1264.052640 1264.601342
1 1264.098325 1261.705513 1265.604662
2 1264.743786 1265.015529 1246.347400
3 1265.427440 1266.378709 1266.430373
4 1266.358895 1267.674168 NaN, 'aic_min_order': (2, 2)}
adf: -1.96448506629
p-value: 0.302358888762
Critical values: {'5%': -2.8970475206326833, '1%': -3.5117123057187376, '10%': -2.5857126912469153}
Time Series is nonstationary
1
但当我使用Augumented Dickey-Fuller测试时,测试结果表明序列不是平稳的
d_order0=sm.tsa.adfuller(dataseries)
print 'adf: ', d_order0[0]
print 'p-value: ', d_order0[1]
print'Critical values: ', d_order0[4]
if d_order0[0]> d_order0[4]['5%']:
print 'Time Series is nonstationary'
print d
else:
print 'Time Series is stationary'
print d
输出如下:
{'aic': 0 1 2
0 1262.244974 1264.052640 1264.601342
1 1264.098325 1261.705513 1265.604662
2 1264.743786 1265.015529 1246.347400
3 1265.427440 1266.378709 1266.430373
4 1266.358895 1267.674168 NaN, 'aic_min_order': (2, 2)}
adf: -1.96448506629
p-value: 0.302358888762
Critical values: {'5%': -2.8970475206326833, '1%': -3.5117123057187376, '10%': -2.5857126912469153}
Time Series is nonstationary
1
当我用R交叉验证结果时,它表明默认序列是平稳的。那么为什么预兆的dickey fuller检验结果是非平稳序列呢?很明显,你的数据中有一些季节性。然后,需要仔细进行arma模型和平稳性测试 显然,python和R之间adf测试差异的原因是每个软件使用的默认延迟数
> (nobs=length(dataseries))
[1] 91
> 12*(nobs/100)^(1/4) #python default
[1] 11.72038
> trunc((nobs-1)^(1/3)) #R default
[1] 4
> acf(coredata(dataseries),plot = F)
Autocorrelations of series ‘coredata(dataseries)’, by lag
0 1 2 3 4 5 6 7 8 9 10 11
1.000 0.039 -0.116 -0.124 -0.094 -0.148 0.083 0.645 -0.072 -0.135 -0.138 -0.146
12 13 14 15 16 17 18 19
-0.185 0.066 0.502 -0.097 -0.151 -0.165 -0.195 -0.160
> adf.test(dataseries,k=12)
Augmented Dickey-Fuller Test
data: dataseries
Dickey-Fuller = -2.6172, Lag order = 12, p-value = 0.322
alternative hypothesis: stationary
> adf.test(dataseries,k=4)
Augmented Dickey-Fuller Test
data: dataseries
Dickey-Fuller = -6.276, Lag order = 4, p-value = 0.01
alternative hypothesis: stationary
Warning message:
In adf.test(dataseries, k = 4) : p-value smaller than printed p-value
> adf.test(dataseries,k=7)
Augmented Dickey-Fuller Test
data: dataseries
Dickey-Fuller = -2.2571, Lag order = 7, p-value = 0.4703
alternative hypothesis: stationary
adfuller并不否认存在单位根。这也可能意味着,即使过程是静止的,也没有足够的力量来拒绝单位根假设。你在R中使用了什么?你“显示”了序列是平稳的?我在R中使用了auto.arima(y),它给了我结果(1,0,1),但在python中adfuller测试将序列y描述为非平稳的。但即使指定滞后顺序=12。序列保持非平稳,测试结果为adf=-1.96448506629,p=0.3023588762,这与您的答案中的滞后12结果显著不同。还有一组数据,R和Python都将其显示为非平稳的。对于python中的本系列,If-lag被取为4。它变得静止不动。如何考虑实际滞后?这是上面提到的系列[ 9560,4010,3790,3840,9150,10230,9570,8230,4640,3730,5820,10410,10220,10040,6720,4290,3820,8700,10040,10820,10080,4160,4320,4140, 9360,10000,10410,7830,9640,3950,5130,9420,9590,9070,10950,10320,3640,4260,10270,10380,9230,10750,10410,5160,5540,11160,11000,11110, 9850,867,4830,5100,10680,11290,10930,10410,10380,4300,4270,10550,9170,13158,12407,10111,599750831057710464105921190811150586755711226211099858489801391613556811030908010124541089910706525957311139299201164011401401401401]见自相关函数acf,plot=TRUE,第七个滞后是有固定滞后和常数的显著滞后,趋势
sm.tsa.adfuller(a,maxlag=12,autolag=None,regression='ct'))
报告的测试统计数据与Robert的答案相同,但p值不同,基于外推,不精确。