Python:简单OLS的超级字典

Python:简单OLS的超级字典,python,arrays,pandas,for-loop,statsmodels,Python,Arrays,Pandas,For Loop,Statsmodels,我正在尝试建立一个超级字典,它包含在一些较低级别的库中 概念 我有我的零售银行过去12年的利率,我试图通过使用不同债券的投资组合来模拟利率 回归公式 Y_i - Y_i-1 = A + B(X_i - X_i-1) + E Note: Y = Historic Rate df = pd.DataFrame(np.random.randint(low=0, high=10, size=(100,17)), columns=['Historic Rate', 'Ov

我正在尝试建立一个超级字典,它包含在一些较低级别的库中

概念

我有我的零售银行过去12年的利率,我试图通过使用不同债券的投资组合来模拟利率

回归公式

Y_i - Y_i-1 = A + B(X_i - X_i-1) + E
Note: Y = Historic Rate

df = pd.DataFrame(np.random.randint(low=0, high=10, size=(100,17)), 
              columns=['Historic Rate', 'Overnight', '1M', '3M', '6M','1Y','2Y','3Y','4Y','5Y','6Y','7Y','8Y','9Y','10Y','12Y','15Y'])
换句话说,Y_滞后=α+β(X_滞后)+误差项

数据

Y_i - Y_i-1 = A + B(X_i - X_i-1) + E
Note: Y = Historic Rate

df = pd.DataFrame(np.random.randint(low=0, high=10, size=(100,17)), 
              columns=['Historic Rate', 'Overnight', '1M', '3M', '6M','1Y','2Y','3Y','4Y','5Y','6Y','7Y','8Y','9Y','10Y','12Y','15Y'])
迄今为止的代码

#Import packages required for the analysis

import pandas as pd
import numpy as np
import statsmodels.api as sm

def Simulation(TotalSim,j):
    #super dictionary to hold all iterations of the loop
    Super_fit_d = {}
    for i in range(1,TotalSim):
        #Create a introductory loop to run the first set of regressions
        #Each loop produces a univariate regression
        #Each loop has a fixed lag of i

        fit_d = {}  # This will hold all of the fit results and summaries
        for col in [x for x in df.columns if x != 'Historic Rate']:
            Y = df['Historic Rate'] - df['Historic Rate'].shift(1)
            # Need to remove the NaN for fit
            Y = Y[Y.notnull()]

            X = df[col] - df[col].shift(i)
            X = X[X.notnull()]
            #Y now has more observations than X due to lag, drop rows to match
            Y = Y.drop(Y.index[0:i-1])

            if j = 1:
                X = sm.add_constant(X)  # Add a constant to the fit

            fit_d[col] = sm.OLS(Y,X).fit()
        #append the dictionary for each lag onto the super dictionary
        Super_fit_d[lag_i] = fit_d

#Check the output for one column
fit_d['Overnight'].summary()

#Check the output for one column in one segment of the super dictionary
Super_fit_d['lag_5'].fit_d['Overnight'].summary()

Simulation(11,1)
问题

我似乎在用每个循环覆盖我的字典,并且我没有正确地评估I,以将迭代索引为lag_1、lag_2、lag_3等。我如何解决这个问题


提前感谢

这里有几个问题:

  • 有时使用i,有时使用lag\u i,但只有i被定义。为了保持一致性,我将all更改为lag_I
  • 如果j=1是不正确的语法。如果j==1,则需要
  • 您需要返回fit\u d,以便它在循环后持续存在
  • 我通过应用这些更改来完成它

    import pandas as pd
    import numpy as np
    import statsmodels.api as sm
    
    df = pd.DataFrame(np.random.randint(low=0, high=10, size=(100,17)), 
                  columns=['Historic Rate', 'Overnight', '1M', '3M', '6M','1Y','2Y','3Y','4Y','5Y','6Y','7Y','8Y','9Y','10Y','12Y','15Y'])
    
    def Simulation(TotalSim,j):
        Super_fit_d = {}
        for lag_i in range(1,TotalSim):
            #Create a introductory loop to run the first set of regressions
            #Each loop produces a univariate regression
            #Each loop has a fixed lag of i
    
            fit_d = {}  # This will hold all of the fit results and summaries
            for col in [x for x in df.columns if x != 'Historic Rate']:
                Y = df['Historic Rate'] - df['Historic Rate'].shift(1)
                # Need to remove the NaN for fit
                Y = Y[Y.notnull()]
    
                X = df[col] - df[col].shift(lag_i)
                X = X[X.notnull()]
                #Y now has more observations than X due to lag, drop rows to match
                Y = Y.drop(Y.index[0:lag_i-1])
    
                if j == 1:
                    X = sm.add_constant(X)  # Add a constant to the fit
    
                fit_d[col] = sm.OLS(Y,X).fit()
            #append the dictionary for each lag onto the super dictionary
          #  return fit_d
                Super_fit_d[lag_i] = fit_d
        return Super_fit_d
    
    
    
    test_dict = Simulation(11,1)
    
    第一滞后
    test_dict[1][‘隔夜’]摘要()
    出[76]:
    """
    OLS回归结果
    ==============================================================================
    部门变量:历史汇率R平方:0.042
    模型:OLS调整R平方:0.033
    方法:最小二乘F-统计量:4.303
    日期:2018年9月28日星期五概率(F-统计):0.0407
    时间:11:15:13对数似然:-280.39
    编号:99 AIC:564.8
    Df残差:97 BIC:570.0
    Df型号:1
    协方差类型:非稳健
    ==============================================================================
    coef标准误差tp>|t |[0.025 0.975]
    ------------------------------------------------------------------------------
    常数-0.0048 0.417-0.012 0.991-0.833 0.823
    隔夜0.2176 0.105 2.074 0.041 0.009 0.426
    ==============================================================================
    总括:1.449德宾沃森:2.756
    概率(综合):0.485贾尔克贝拉(JB):1.180
    倾斜:0.005概率(JB):0.554
    峰度:2.465秒3.98
    ==============================================================================
    警告:
    [1] 标准误差假设正确指定了误差的协方差矩阵。
    """
    
    二次滞后
    test_dict[2][‘隔夜’]摘要()
    Out[77]:
    """
    OLS回归结果
    ==============================================================================
    部门变量:历史汇率R平方:0.001
    模型:OLS调整R平方:-0.010
    方法:最小二乘F-统计量:0.06845
    日期:2018年9月28日星期五概率(F-统计):0.794
    时间:11:15:15对数似然:-279.44
    编号:98 AIC:562.9
    Df残差:96 BIC:568.0
    Df型号:1
    协方差类型:非稳健
    ==============================================================================
    coef标准误差tp>|t |[0.025 0.975]
    ------------------------------------------------------------------------------
    常数0.0315 0.428 0.074 0.941-0.817 0.880
    隔夜0.0291 0.111 0.262 0.794-0.192 0.250
    ==============================================================================
    总括:2.457德宾沃森:2.798
    概率(综合):0.293贾尔克贝拉(JB):1.735
    倾斜:0.115 Prob(JB):0.420
    峰度:2.391秒3.84
    ==============================================================================
    警告:
    [1] 标准误差假设正确指定了误差的协方差矩阵。
    """
    
    我觉得仍然用相同的标题覆盖第一个字典的问题是个问题,我需要字典的层次,每个层次对应一个,检查我的最新编辑。我认为新的输出是你想要的,但是如果不是,我会改变它。再编辑一次。新的输出是dict of dicts,第一级索引是lags,第二级索引是column name谢谢,我不得不道歉,我现在不能看这个,但今晚晚些时候我会看