Python:循环一个简单OLS的变量
我希望用Python构建一个函数,用以下等式创建一个简单的OLS回归:Python:循环一个简单OLS的变量,python,pandas,numpy,for-loop,regression,Python,Pandas,Numpy,For Loop,Regression,我希望用Python构建一个函数,用以下等式创建一个简单的OLS回归: Y_i - Y_i-1 = A + B(X_i - X_i-1) + E 换句话说,Y_滞后=α+β(X_滞后)+误差项 目前,我有以下数据集(这是一个简短的版本) 注:Y=历史利率 df = pd.DataFrame(np.random.randint(low=0, high=10, size=(5, 5)), columns=['Historic_Rate', 'Overnight', '1M', '3M', '6M
Y_i - Y_i-1 = A + B(X_i - X_i-1) + E
换句话说,Y_滞后=α+β(X_滞后)+误差项
目前,我有以下数据集(这是一个简短的版本)
注:Y=历史利率
df = pd.DataFrame(np.random.randint(low=0, high=10, size=(5, 5)), columns=['Historic_Rate', 'Overnight', '1M', '3M', '6M'])
因此,我试图构建的是,我迭代地获取一个X变量,并将其放入一个简单的线性回归中,我迄今为止构建的代码如下所示:
#Start the iteration process for the regression to in turn fit 1 parameter
#Import required packages
import pandas as pd
import numpy as np
import statsmodels.formula.api as sm
#Import dataset
df = pd.DataFrame(np.random.randint(low=0, high=10, size=(5, 5)), columns=['Historic_Rate', 'Overnight', '1M', '3M', '6M'])
#Y_Lag is always 1 time period only
df['Y_Lag'] = df['Historic_Rate'].shift(1)
#Begin the process with 1 lag, taking one x variable in turn
array = df[0:0]
array.drop(array.columns[[0,5]], axis=1, inplace=True)
for X in array:
df['X_Lag'] = df['X'].shift(1)
Model = df[df.columns[4:5]]
Y = Model['Y_Lag']
X = Model['X_Lag']
Reg_model = sm.OLS(Y,X).fit()
predictions = model.predict(X)
# make the predictions by the model
# Print out the statistics
model.summary()
所以,从本质上说,我希望创建一个列标题列表,它将系统地遍历我的循环,每个变量都会滞后,然后根据滞后的Y变量进行回归
我还希望了解如何输出模型.X,其中X是数组的第X次迭代,用于变量的动态命名。您很接近,我认为您只是将变量
X
与循环中的字符串'X'
混淆了。我还认为,你不是在计算yi-yi-1
,而是在对yi-1
进行回归
下面是如何循环回归的。我们还将使用字典来存储回归结果,其中键作为列名
import pandas as pd
import numpy as np
import statsmodels.api as sm
df = pd.DataFrame(np.random.randint(low=0, high=10, size=(5, 5)),
columns=['Historic_Rate', 'Overnight', '1M', '3M', '6M'])
fit_d = {} # This will hold all of the fit results and summaries
for col in [x for x in df.columns if x != 'Historic_Rate']:
Y = df['Historic_Rate'] - df['Historic_Rate'].shift(1)
# Need to remove the NaN for fit
Y = Y[Y.notnull()]
X = df[col] - df[col].shift(1)
X = X[X.notnull()]
X = sm.add_constant(X) # Add a constant to the fit
fit_d[col] = sm.OLS(Y,X).fit()
现在如果你想做一些预测,比如说你的上一个模型,你可以做:
fit_d['6M'].predict(sm.add_constant(df['6M']-df['6M'].shift(1)))
#0 NaN
#1 0.5
#2 -2.0
#3 -1.0
#4 -0.5
#dtype: float64
您可以获得摘要:fit_d['6M'].summary()
我的巨大疏忽,我甚至没有意识到我没有接受差异,这看起来很全面,谢谢你!另外,请不要忘记,您需要在
statsmodels
中手动添加一个常量到fit中。谢谢,这非常有效,我期待着与其他更基本的语言一样熟悉语法
OLS Regression Results
==============================================================================
Dep. Variable: Historic_Rate R-squared: 0.101
Model: OLS Adj. R-squared: -0.348
Method: Least Squares F-statistic: 0.2254
Date: Thu, 27 Sep 2018 Prob (F-statistic): 0.682
Time: 11:27:33 Log-Likelihood: -9.6826
No. Observations: 4 AIC: 23.37
Df Residuals: 2 BIC: 22.14
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const -0.4332 1.931 -0.224 0.843 -8.740 7.873
6M -0.2674 0.563 -0.475 0.682 -2.691 2.156
==============================================================================
Omnibus: nan Durbin-Watson: 2.301
Prob(Omnibus): nan Jarque-Bera (JB): 0.254
Skew: -0.099 Prob(JB): 0.881
Kurtosis: 1.781 Cond. No. 3.44
==============================================================================