Python 撤销一系列差异
我有一个每月数据的熊猫系列(Python 撤销一系列差异,python,pandas,difference,forecasting,statsmodels,Python,Pandas,Difference,Forecasting,Statsmodels,我有一个每月数据的熊猫系列(df.sales)。我需要减去12个月前的数据以拟合时间序列,因此我运行了以下命令: sales_new = df.sales.diff(periods=12) 然后,我拟合了ARMA模型,并预测了未来: model = ARMA(sales_new, order=(2,0)).fit() model.predict('2015-01-01', '2017-01-01') 因为我对销售数据进行了差异化处理,所以当我使用该模型进行预测时,它预测了远期差异。如果这是周
df.sales
)。我需要减去12个月前的数据以拟合时间序列,因此我运行了以下命令:
sales_new = df.sales.diff(periods=12)
然后,我拟合了ARMA模型,并预测了未来:
model = ARMA(sales_new, order=(2,0)).fit()
model.predict('2015-01-01', '2017-01-01')
因为我对销售数据进行了差异化处理,所以当我使用该模型进行预测时,它预测了远期差异。如果这是周期1的差异,我只会使用一个np.cumsum()
,但因为这是周期12,所以它有点诡计
“展开”差异并将其恢复到原始数据规模的最佳方法是什么?我认为您需要根据前12个月的值计算未来值:
periods = 12
df = pd.DataFrame(data={'value': np.random.random(size=24)}, index=pd.date_range(start=date(2014, 1,1), freq='M', periods=24))
diffs = df.diff(periods=periods)
restored = df.copy()
restored.iloc[periods:] = np.nan
for d, val in diffs.iloc[periods:].iterrows():
restored.loc[d] = restored.loc[d - pd.DateOffset(months=periods)].value + val
res = pd.concat([df, diffs, restored], axis=1)
res.columns = ['original', 'diffs', 'restored']
original diffs restored
2014-01-31 0.926367 NaN 0.926367
2014-02-28 0.688898 NaN 0.688898
2014-03-31 0.297025 NaN 0.297025
2014-04-30 0.139094 NaN 0.139094
2014-05-31 0.375082 NaN 0.375082
2014-06-30 0.490638 NaN 0.490638
2014-07-31 0.789683 NaN 0.789683
2014-08-31 0.236841 NaN 0.236841
2014-09-30 0.263245 NaN 0.263245
2014-10-31 0.547025 NaN 0.547025
2014-11-30 0.243444 NaN 0.243444
2014-12-31 0.385028 NaN 0.385028
2015-01-31 0.823224 -0.103142 0.823224
2015-02-28 0.828245 0.139347 0.828245
2015-03-31 0.753291 0.456266 0.753291
2015-04-30 0.447670 0.308576 0.447670
2015-05-31 0.936667 0.561584 0.936667
2015-06-30 0.223049 -0.267589 0.223049
2015-07-31 0.933942 0.144259 0.933942
2015-08-31 0.325726 0.088886 0.325726
2015-09-30 0.947526 0.684281 0.947526
2015-10-31 0.524749 -0.022276 0.524749
2015-11-30 0.431671 0.188227 0.431671
2015-12-31 0.234028 -0.151000 0.234028
这应该做到:
def rebuild_diffed(series, first_element_original):
cumsum = series.cumsum()
return cumsum.fillna(0) + first_element_original
逐步版本:
# making some data
a = pd.Series([2, 6, 4, 6, 2,])
print(a)
a_diff = a.diff()
print(a_diff)
# Rebuilding
a_diff_cumsum = a_diff.cumsum()
print(a_diff_cumsum)
rebuilt = a_diff_cumsum.fillna(0) + 2
print(rebuilt)
print(rebuilt == a)
要区分,请使用以下命令:
def differentiate(values, d=1):
x = np.concatenate([[values[0]], values[1:]-values[:-1]])
if d != 1:
return differentiate(x, d - 1)
else:
return x
def integrate(values, d=1):
x = np.cumsum(values)
if d != 1:
return integrate(x, d-1)
else:
return x
要集成回,请使用以下命令:
def differentiate(values, d=1):
x = np.concatenate([[values[0]], values[1:]-values[:-1]])
if d != 1:
return differentiate(x, d - 1)
else:
return x
def integrate(values, d=1):
x = np.cumsum(values)
if d != 1:
return integrate(x, d-1)
else:
return x
确保您的输入在numpy数组中。您还可以更改差异。因此,函数integrate就是您所寻求的。您能给出一个示例数据框,说明您所拥有的以及您希望得到的结果吗?这到底有帮助吗?