Python 如何在STATSOLS回归模型中包含滞后变量
在STATSOLS回归模型中是否有指定滞后自变量的方法?下面是一个示例数据帧和ols模型规范。我想在模型中加入一个滞后变量Python 如何在STATSOLS回归模型中包含滞后变量,python,regression,statsmodels,Python,Regression,Statsmodels,在STATSOLS回归模型中是否有指定滞后自变量的方法?下面是一个示例数据帧和ols模型规范。我想在模型中加入一个滞后变量 df = pd.DataFrame({ "y": [2,3,7,8,1], "x": [8,6,2,1,9], "v": [4,3,1,3,8] }) Current
df = pd.DataFrame({
"y": [2,3,7,8,1],
"x": [8,6,2,1,9],
"v": [4,3,1,3,8]
})
Current model:
model = sm.ols(formula = 'y ~ x + v', data=df).fit()
Desired model:
model_lag = sm.ols(formula = 'y ~ (x-1) + v', data=df).fit()
我不认为你可以在公式中称之为即时。也许用这个方法?如果这不是您需要的,请务必澄清
import statsmodels.api as sm
df['xlag'] = df['x'].shift()
df
y x v xlag
0 2 8 4 NaN
1 3 6 3 8.0
2 7 2 1 6.0
3 8 1 3 2.0
4 1 9 8 1.0
sm.formula.ols(formula = 'y ~ xlag + v', data=df).fit()
这已经有了一个公认的答案,但再加上我的2美分:
- 最好在换档前验证该指数(否则您的滞后可能不是您所认为的)
- 可以定义一个函数,该函数在公式中的许多地方都是可重用的
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
df = pd.DataFrame({"y": [2, 3, 7, 8, 1], "x": [8, 6, 2, 1, 9], "v": [4, 3, 1, 3, 8]})
df.index = pd.PeriodIndex(
year=[2000, 2000, 2000, 2000, 2001], quarter=[1, 2, 3, 4, 1], freq="Q", name="period"
)
def lag(x, n, validate=True):
"""Calculates the lag of a pandas Series
Args:
x (pd.Series): the data to lag
n (int): How many periods to go back (lag length)
validate (bool, optional): Validate the series index (monotonic increasing + no gaps + no duplicates).
If specified, expect the index to be a pandas PeriodIndex
Defaults to True.
Returns:
pd.Series: pd.Series.shift(n) -- lagged series
"""
if n == 0:
return x
if isinstance(x, pd.Series):
if validate:
assert x.index.is_monotonic_increasing, (
"\u274c" + f"x.index is not monotonic_increasing"
)
assert x.index.is_unique, "\u274c" + f"x.index is not unique"
idx_full = pd.period_range(start=x.index.min(), end=x.index.max(), freq=x.index.freq)
assert np.all(x.index == idx_full), "\u274c" + f"Gaps found in x.index"
return x.shift(n)
return x.shift(n)
# Manually create lag as variable:
df["x_1"] = df["x"].shift(1)
smf.ols(formula="y ~ x_1 + v", data=df).fit().summary()
# Use the defined function in the formula:
smf.ols(formula="y ~ lag(x,1) + v", data=df).fit().summary()
# ... can use in multiple places too:
smf.ols(formula="y ~ lag(x,1) + lag(v, 2)", data=df).fit().summary()