在Python中使用季节分解时,我做错了什么?
我有一个每月间隔的小时间序列。我想绘制它,然后分解成季节性,趋势,残差。我首先将csv导入pandas,然后只绘制工作正常的时间序列。我遵循教程,代码如下所示:在Python中使用季节分解时,我做错了什么?,python,pandas,jupyter,Python,Pandas,Jupyter,我有一个每月间隔的小时间序列。我想绘制它,然后分解成季节性,趋势,残差。我首先将csv导入pandas,然后只绘制工作正常的时间序列。我遵循教程,代码如下所示: %matplotlib inline import matplotlib.pyplot as plt import matplotlib.dates as mdates import pandas as pd ali3 = pd.read_csv('C:\\Users\\ALI\\Desktop\\CSV\\index\\ZIAM\\
%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd
ali3 = pd.read_csv('C:\\Users\\ALI\\Desktop\\CSV\\index\\ZIAM\\ME\\ME_DATA_7_MONTH_AVG_PROFIT\\data.csv',
names=['Date', 'Month','AverageProfit'],
index_col=['Date'],
parse_dates=True)
\* Delete month column which is a string */
del ali3['Month']
ali3
plt.plot(ali3)
import statsmodels.api as sm
res = sm.tsa.seasonal_decompose(ali3.AverageProfit)
fig = res.plot()
在这个阶段,我试着这样做季节分解:
%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd
ali3 = pd.read_csv('C:\\Users\\ALI\\Desktop\\CSV\\index\\ZIAM\\ME\\ME_DATA_7_MONTH_AVG_PROFIT\\data.csv',
names=['Date', 'Month','AverageProfit'],
index_col=['Date'],
parse_dates=True)
\* Delete month column which is a string */
del ali3['Month']
ali3
plt.plot(ali3)
import statsmodels.api as sm
res = sm.tsa.seasonal_decompose(ali3.AverageProfit)
fig = res.plot()
这将导致以下错误:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-41-afeab639d13b> in <module>()
1 import statsmodels.api as sm
----> 2 res = sm.tsa.seasonal_decompose(ali3.AverageProfit)
3 fig = res.plot()
C:\Users\D063375\AppData\Local\Continuum\Anaconda2\lib\site-packages\statsmodels\tsa\seasonal.py in seasonal_decompose(x, model, filt, freq)
86 filt = np.repeat(1./freq, freq)
87
---> 88 trend = convolution_filter(x, filt)
89
90 # nan pad for conformability - convolve doesn't do it
C:\Users\D063375\AppData\Local\Continuum\Anaconda2\lib\site-packages\statsmodels\tsa\filters\filtertools.py in convolution_filter(x, filt, nsides)
287
288 if filt.ndim == 1 or min(filt.shape) == 1:
--> 289 result = signal.convolve(x, filt, mode='valid')
290 elif filt.ndim == 2:
291 nlags = filt.shape[0]
C:\Users\D063375\AppData\Local\Continuum\Anaconda2\lib\site-packages\scipy\signal\signaltools.py in convolve(in1, in2, mode)
468 return correlate(volume, kernel[slice_obj].conj(), mode)
469 else:
--> 470 return correlate(volume, kernel[slice_obj], mode)
471
472
C:\Users\D063375\AppData\Local\Continuum\Anaconda2\lib\site-packages\scipy\signal\signaltools.py in correlate(in1, in2, mode)
158
159 if mode == 'valid':
--> 160 _check_valid_mode_shapes(in1.shape, in2.shape)
161 # numpy is significantly faster for 1d
162 if in1.ndim == 1 and in2.ndim == 1:
C:\Users\D063375\AppData\Local\Continuum\Anaconda2\lib\site-packages\scipy\signal\signaltools.py in _check_valid_mode_shapes(shape1, shape2)
70 if not d1 >= d2:
71 raise ValueError(
---> 72 "in1 should have at least as many items as in2 in "
73 "every dimension for 'valid' mode.")
74
ValueError: in1 should have at least as many items as in2 in every dimension for 'valid' mode.
您有7个数据点,这对于执行平稳性分析来说通常是一个非常小的数字 您没有足够的点数来使用季节分解。要了解这一点,您可以连接数据以创建扩展的时间序列(只需在接下来的几个月内重复数据)。让
extendedData
成为此扩展数据帧和data
原始数据
data.plot()
季节性估算的频率(freq
)是根据数据自动估算的,可以手动指定
您可以尝试使用第一个差异:生成一个新的时间序列,将前一个数据值减去每个数据值。在您的情况下,它看起来是这样的:
%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd
ali3 = pd.read_csv('C:\\Users\\ALI\\Desktop\\CSV\\index\\ZIAM\\ME\\ME_DATA_7_MONTH_AVG_PROFIT\\data.csv',
names=['Date', 'Month','AverageProfit'],
index_col=['Date'],
parse_dates=True)
\* Delete month column which is a string */
del ali3['Month']
ali3
plt.plot(ali3)
import statsmodels.api as sm
res = sm.tsa.seasonal_decompose(ali3.AverageProfit)
fig = res.plot()
接下来可以应用平稳性测试,如前所述请您将数据框作为文本而不是屏幕截图包含在问题中好吗?好的,我会编辑它。对不起,之前误解了您的问题。可能是您没有传递
freq
值,即季节性时间刻度?谢谢您的回答,但我的数据集只包含七个月的数据。这是否意味着我不能分解它?我们是在谈论每年的季节性(即12个月)?如果是这样,你将需要更多的数据,至少2年,尽管这一切都取决于数据的随机性,等等。我之前发布的链接有着宝贵的见解!我对时间序列的统计研究还不熟悉。我有七个月的数据。我想看看时间序列是否是平稳的。我认为将其分解为趋势、季节性和七个月的残差可能会使其更加明显。谢谢你的全面回答。我真的很感激。这也让我意识到我需要学习更多关于时间序列分析背后的理论。