Python：简单OLS的超级字典_Python_Arrays_Pandas_For Loop_Statsmodels

Python：简单OLS的超级字典

python arrays pandas for-loop

Python：简单OLS的超级字典,python,arrays,pandas,for-loop,statsmodels,Python,Arrays,Pandas,For Loop,Statsmodels,我正在尝试建立一个超级字典，它包含在一些较低级别的库中概念我有我的零售银行过去12年的利率，我试图通过使用不同债券的投资组合来模拟利率回归公式 Y_i - Y_i-1 = A + B(X_i - X_i-1) + E Note: Y = Historic Rate df = pd.DataFrame(np.random.randint(low=0, high=10, size=(100,17)), columns=['Historic Rate', 'Ov

我正在尝试建立一个超级字典，它包含在一些较低级别的库中

概念

我有我的零售银行过去12年的利率，我试图通过使用不同债券的投资组合来模拟利率

回归公式

Y_i - Y_i-1 = A + B(X_i - X_i-1) + E

Note: Y = Historic Rate

df = pd.DataFrame(np.random.randint(low=0, high=10, size=(100,17)), 
              columns=['Historic Rate', 'Overnight', '1M', '3M', '6M','1Y','2Y','3Y','4Y','5Y','6Y','7Y','8Y','9Y','10Y','12Y','15Y'])

换句话说，Y_滞后=α+β（X_滞后）+误差项

数据

Y_i - Y_i-1 = A + B(X_i - X_i-1) + E

Note: Y = Historic Rate

df = pd.DataFrame(np.random.randint(low=0, high=10, size=(100,17)), 
              columns=['Historic Rate', 'Overnight', '1M', '3M', '6M','1Y','2Y','3Y','4Y','5Y','6Y','7Y','8Y','9Y','10Y','12Y','15Y'])

迄今为止的代码

#Import packages required for the analysis

import pandas as pd
import numpy as np
import statsmodels.api as sm

def Simulation(TotalSim,j):
    #super dictionary to hold all iterations of the loop
    Super_fit_d = {}
    for i in range(1,TotalSim):
        #Create a introductory loop to run the first set of regressions
        #Each loop produces a univariate regression
        #Each loop has a fixed lag of i

        fit_d = {}  # This will hold all of the fit results and summaries
        for col in [x for x in df.columns if x != 'Historic Rate']:
            Y = df['Historic Rate'] - df['Historic Rate'].shift(1)
            # Need to remove the NaN for fit
            Y = Y[Y.notnull()]

            X = df[col] - df[col].shift(i)
            X = X[X.notnull()]
            #Y now has more observations than X due to lag, drop rows to match
            Y = Y.drop(Y.index[0:i-1])

            if j = 1:
                X = sm.add_constant(X)  # Add a constant to the fit

            fit_d[col] = sm.OLS(Y,X).fit()
        #append the dictionary for each lag onto the super dictionary
        Super_fit_d[lag_i] = fit_d

#Check the output for one column
fit_d['Overnight'].summary()

#Check the output for one column in one segment of the super dictionary
Super_fit_d['lag_5'].fit_d['Overnight'].summary()

Simulation(11,1)

问题

我似乎在用每个循环覆盖我的字典，并且我没有正确地评估I，以将迭代索引为lag_1、lag_2、lag_3等。我如何解决这个问题

提前感谢

这里有几个问题：

有时使用i，有时使用lag\u i，但只有i被定义。为了保持一致性，我将all更改为lag_I

如果j=1是不正确的语法。如果j==1，则需要


您需要返回fit\u d，以便它在循环后持续存在
我通过应用这些更改来完成它
import pandas as pd
import numpy as np
import statsmodels.api as sm

df = pd.DataFrame(np.random.randint(low=0, high=10, size=(100,17)), 
              columns=['Historic Rate', 'Overnight', '1M', '3M', '6M','1Y','2Y','3Y','4Y','5Y','6Y','7Y','8Y','9Y','10Y','12Y','15Y'])

def Simulation(TotalSim,j):
    Super_fit_d = {}
    for lag_i in range(1,TotalSim):
        #Create a introductory loop to run the first set of regressions
        #Each loop produces a univariate regression
        #Each loop has a fixed lag of i

        fit_d = {}  # This will hold all of the fit results and summaries
        for col in [x for x in df.columns if x != 'Historic Rate']:
            Y = df['Historic Rate'] - df['Historic Rate'].shift(1)
            # Need to remove the NaN for fit
            Y = Y[Y.notnull()]

            X = df[col] - df[col].shift(lag_i)
            X = X[X.notnull()]
            #Y now has more observations than X due to lag, drop rows to match
            Y = Y.drop(Y.index[0:lag_i-1])

            if j == 1:
                X = sm.add_constant(X)  # Add a constant to the fit

            fit_d[col] = sm.OLS(Y,X).fit()
        #append the dictionary for each lag onto the super dictionary
      #  return fit_d
            Super_fit_d[lag_i] = fit_d
    return Super_fit_d



test_dict = Simulation(11,1)

第一滞后
test_dict[1][‘隔夜’]摘要（）
出[76]：
"""
OLS回归结果
==============================================================================
部门变量：历史汇率R平方：0.042
模型：OLS调整R平方：0.033
方法：最小二乘F-统计量：4.303
日期：2018年9月28日星期五概率（F-统计）：0.0407
时间：11:15:13对数似然：-280.39
编号：99 AIC:564.8
Df残差：97 BIC:570.0
Df型号：1
协方差类型：非稳健
==============================================================================
coef标准误差tp>|t |[0.025 0.975]
------------------------------------------------------------------------------
常数-0.0048 0.417-0.012 0.991-0.833 0.823
隔夜0.2176 0.105 2.074 0.041 0.009 0.426
==============================================================================
总括：1.449德宾沃森：2.756
概率（综合）：0.485贾尔克贝拉（JB）：1.180
倾斜：0.005概率（JB）：0.554
峰度：2.465秒3.98
==============================================================================
警告:
[1] 标准误差假设正确指定了误差的协方差矩阵。
"""

二次滞后
test_dict[2][‘隔夜’]摘要（）
Out[77]：
"""
OLS回归结果
==============================================================================
部门变量：历史汇率R平方：0.001
模型：OLS调整R平方：-0.010
方法：最小二乘F-统计量：0.06845
日期：2018年9月28日星期五概率（F-统计）：0.794
时间：11:15:15对数似然：-279.44
编号：98 AIC:562.9
Df残差：96 BIC:568.0
Df型号：1
协方差类型：非稳健
==============================================================================
coef标准误差tp>|t |[0.025 0.975]
------------------------------------------------------------------------------
常数0.0315 0.428 0.074 0.941-0.817 0.880
隔夜0.0291 0.111 0.262 0.794-0.192 0.250
==============================================================================
总括：2.457德宾沃森：2.798
概率（综合）：0.293贾尔克贝拉（JB）：1.735
倾斜：0.115 Prob（JB）：0.420
峰度：2.391秒3.84
==============================================================================
警告:
[1] 标准误差假设正确指定了误差的协方差矩阵。
"""
我觉得仍然用相同的标题覆盖第一个字典的问题是个问题，我需要字典的层次，每个层次对应一个，检查我的最新编辑。我认为新的输出是你想要的，但是如果不是，我会改变它。再编辑一次。新的输出是dict of dicts，第一级索引是lags，第二级索引是column name谢谢，我不得不道歉，我现在不能看这个，但今晚晚些时候我会看