Python:简单OLS的超级字典
我正在尝试建立一个超级字典,它包含在一些较低级别的库中 概念 我有我的零售银行过去12年的利率,我试图通过使用不同债券的投资组合来模拟利率 回归公式Python:简单OLS的超级字典,python,arrays,pandas,for-loop,statsmodels,Python,Arrays,Pandas,For Loop,Statsmodels,我正在尝试建立一个超级字典,它包含在一些较低级别的库中 概念 我有我的零售银行过去12年的利率,我试图通过使用不同债券的投资组合来模拟利率 回归公式 Y_i - Y_i-1 = A + B(X_i - X_i-1) + E Note: Y = Historic Rate df = pd.DataFrame(np.random.randint(low=0, high=10, size=(100,17)), columns=['Historic Rate', 'Ov
Y_i - Y_i-1 = A + B(X_i - X_i-1) + E
Note: Y = Historic Rate
df = pd.DataFrame(np.random.randint(low=0, high=10, size=(100,17)),
columns=['Historic Rate', 'Overnight', '1M', '3M', '6M','1Y','2Y','3Y','4Y','5Y','6Y','7Y','8Y','9Y','10Y','12Y','15Y'])
换句话说,Y_滞后=α+β(X_滞后)+误差项
数据
Y_i - Y_i-1 = A + B(X_i - X_i-1) + E
Note: Y = Historic Rate
df = pd.DataFrame(np.random.randint(low=0, high=10, size=(100,17)),
columns=['Historic Rate', 'Overnight', '1M', '3M', '6M','1Y','2Y','3Y','4Y','5Y','6Y','7Y','8Y','9Y','10Y','12Y','15Y'])
迄今为止的代码
#Import packages required for the analysis
import pandas as pd
import numpy as np
import statsmodels.api as sm
def Simulation(TotalSim,j):
#super dictionary to hold all iterations of the loop
Super_fit_d = {}
for i in range(1,TotalSim):
#Create a introductory loop to run the first set of regressions
#Each loop produces a univariate regression
#Each loop has a fixed lag of i
fit_d = {} # This will hold all of the fit results and summaries
for col in [x for x in df.columns if x != 'Historic Rate']:
Y = df['Historic Rate'] - df['Historic Rate'].shift(1)
# Need to remove the NaN for fit
Y = Y[Y.notnull()]
X = df[col] - df[col].shift(i)
X = X[X.notnull()]
#Y now has more observations than X due to lag, drop rows to match
Y = Y.drop(Y.index[0:i-1])
if j = 1:
X = sm.add_constant(X) # Add a constant to the fit
fit_d[col] = sm.OLS(Y,X).fit()
#append the dictionary for each lag onto the super dictionary
Super_fit_d[lag_i] = fit_d
#Check the output for one column
fit_d['Overnight'].summary()
#Check the output for one column in one segment of the super dictionary
Super_fit_d['lag_5'].fit_d['Overnight'].summary()
Simulation(11,1)
问题
我似乎在用每个循环覆盖我的字典,并且我没有正确地评估I,以将迭代索引为lag_1、lag_2、lag_3等。我如何解决这个问题
提前感谢这里有几个问题:
有时使用i,有时使用lag\u i,但只有i被定义。为了保持一致性,我将all更改为lag_I
如果j=1是不正确的语法。如果j==1,则需要
您需要返回fit\u d,以便它在循环后持续存在
import pandas as pd
import numpy as np
import statsmodels.api as sm
df = pd.DataFrame(np.random.randint(low=0, high=10, size=(100,17)),
columns=['Historic Rate', 'Overnight', '1M', '3M', '6M','1Y','2Y','3Y','4Y','5Y','6Y','7Y','8Y','9Y','10Y','12Y','15Y'])
def Simulation(TotalSim,j):
Super_fit_d = {}
for lag_i in range(1,TotalSim):
#Create a introductory loop to run the first set of regressions
#Each loop produces a univariate regression
#Each loop has a fixed lag of i
fit_d = {} # This will hold all of the fit results and summaries
for col in [x for x in df.columns if x != 'Historic Rate']:
Y = df['Historic Rate'] - df['Historic Rate'].shift(1)
# Need to remove the NaN for fit
Y = Y[Y.notnull()]
X = df[col] - df[col].shift(lag_i)
X = X[X.notnull()]
#Y now has more observations than X due to lag, drop rows to match
Y = Y.drop(Y.index[0:lag_i-1])
if j == 1:
X = sm.add_constant(X) # Add a constant to the fit
fit_d[col] = sm.OLS(Y,X).fit()
#append the dictionary for each lag onto the super dictionary
# return fit_d
Super_fit_d[lag_i] = fit_d
return Super_fit_d
test_dict = Simulation(11,1)
第一滞后
test_dict[1][‘隔夜’]摘要()
出[76]:
"""
OLS回归结果
==============================================================================
部门变量:历史汇率R平方:0.042
模型:OLS调整R平方:0.033
方法:最小二乘F-统计量:4.303
日期:2018年9月28日星期五概率(F-统计):0.0407
时间:11:15:13对数似然:-280.39
编号:99 AIC:564.8
Df残差:97 BIC:570.0
Df型号:1
协方差类型:非稳健
==============================================================================
coef标准误差tp>|t |[0.025 0.975]
------------------------------------------------------------------------------
常数-0.0048 0.417-0.012 0.991-0.833 0.823
隔夜0.2176 0.105 2.074 0.041 0.009 0.426
==============================================================================
总括:1.449德宾沃森:2.756
概率(综合):0.485贾尔克贝拉(JB):1.180
倾斜:0.005概率(JB):0.554
峰度:2.465秒3.98
==============================================================================
警告:
[1] 标准误差假设正确指定了误差的协方差矩阵。
"""
二次滞后
test_dict[2][‘隔夜’]摘要()
Out[77]:
"""
OLS回归结果
==============================================================================
部门变量:历史汇率R平方:0.001
模型:OLS调整R平方:-0.010
方法:最小二乘F-统计量:0.06845
日期:2018年9月28日星期五概率(F-统计):0.794
时间:11:15:15对数似然:-279.44
编号:98 AIC:562.9
Df残差:96 BIC:568.0
Df型号:1
协方差类型:非稳健
==============================================================================
coef标准误差tp>|t |[0.025 0.975]
------------------------------------------------------------------------------
常数0.0315 0.428 0.074 0.941-0.817 0.880
隔夜0.0291 0.111 0.262 0.794-0.192 0.250
==============================================================================
总括:2.457德宾沃森:2.798
概率(综合):0.293贾尔克贝拉(JB):1.735
倾斜:0.115 Prob(JB):0.420
峰度:2.391秒3.84
==============================================================================
警告:
[1] 标准误差假设正确指定了误差的协方差矩阵。
"""
我觉得仍然用相同的标题覆盖第一个字典的问题是个问题,我需要字典的层次,每个层次对应一个,检查我的最新编辑。我认为新的输出是你想要的,但是如果不是,我会改变它。再编辑一次。新的输出是dict of dicts,第一级索引是lags,第二级索引是column name谢谢,我不得不道歉,我现在不能看这个,但今晚晚些时候我会看