Python 使用summary_out时将回归结果导出为csv文件_Python_Python 3.x_Pandas_Regression_Statsmodels

Python 使用summary_out时将回归结果导出为csv文件

python python-3.x pandas

Python 使用summary_out时将回归结果导出为csv文件,python,python-3.x,pandas,regression,statsmodels,Python,Python 3.x,Pandas,Regression,Statsmodels,我正在使用雅虎的财务数据进行多元回归！来自法国的金融和法玛因素单因素回归： CAPM = sm.ols( formula = 'Exret ~ MKT', data=m).fit(cov_type='HAC',cov_kwds={'maxlags':1}) FF3 = sm.ols( formula = 'Exret ~ MKT + SMB + HML', data=m).fit(cov_type='HAC',cov_kwds={'maxlags':1}) 三因素回归： CAPM

我正在使用雅虎的财务数据进行多元回归！来自法国的金融和法玛因素

单因素回归：

CAPM = sm.ols( formula = 'Exret ~ MKT', data=m).fit(cov_type='HAC',cov_kwds={'maxlags':1})

FF3 = sm.ols( formula = 'Exret ~ MKT + SMB + HML',     
data=m).fit(cov_type='HAC',cov_kwds={'maxlags':1})

三因素回归：

CAPM = sm.ols( formula = 'Exret ~ MKT', data=m).fit(cov_type='HAC',cov_kwds={'maxlags':1})

FF3 = sm.ols( formula = 'Exret ~ MKT + SMB + HML',     
data=m).fit(cov_type='HAC',cov_kwds={'maxlags':1})

然后，我使用

summary\u col

创建一个带有重要星号的表格：

dfoutput = summary_col([CAPM,FF3],stars=True,float_format='%0.4f',
model_names=['GOOG','GOOG'],info_dict={'N':lambda x: "{0:d}".format(int(x.nobs)),'Adjusted R2':lambda x: "{:.2f}".format(x.rsquared_adj)}, regressor_order = ['Intercept', 'MKT', 'SMB', 'HML'])

输出

dfoutput
Out[311]: 
<class 'statsmodels.iolib.summary2.Summary'>
"""

=================================
             GOOG I       GOOG II  
---------------------------------
Intercept   -0.0009***   -0.0010***
            (0.0003)      (0.0003)  
MKT         0.0098***     0.0107*** 
            (0.0003)      (0.0003)  
SMB                      -0.0033***
                          (0.0006)  
HML                      -0.0063***
                          (0.0006)  
N              1930         1930      
Adjusted R2    0.37         0.42      
=================================
Standard errors in parentheses.
* p<.1, ** p<.05, ***p<.01

df输出
Out[311]：
"""
=================================
咕咕I咕咕II
---------------------------------
截距-0.0009***-0.0010***
(0.0003)      (0.0003)  
MKT 0.0098***0.0107***
(0.0003)      (0.0003)  
SMB-0.0033***
(0.0006)  
HML-0.0063***
(0.0006)  
N 1930 1930
调整后的R2 0.37 0.42
=================================
括号中的标准错误。
*p只有在修改statsmodel
库中的文件summary2.py
时，才能将括号中的标准错误更改为t-统计
只需将该文件中的函数\u col\u params（）
替换为以下版本：
def _col_params(result, float_format='%.4f', stars=True):
    '''Stack coefficients and standard errors in single column
    '''

    # Extract parameters
    res = summary_params(result)
    # Format float
    for col in res.columns[:3]:
        res[col] = res[col].apply(lambda x: float_format % x)
    # Std.Errors in parentheses
    res.ix[:, 2] = '(' + res.ix[:, 2] + ')'
    # Significance stars
    if stars:
        idx = res.ix[:, 3] < .1
        res.ix[idx, 0] = res.ix[idx, 0] + '*'
        idx = res.ix[:, 3] < .05
        res.ix[idx, 0] = res.ix[idx, 0] + '*'
        idx = res.ix[:, 3] < .01
        res.ix[idx, 0] = res.ix[idx, 0] + '*'
    # Stack Coefs and Std.Errors
    res = res.ix[:, [0,2]]
    res = res.stack()
    res = pd.DataFrame(res)
    res.columns = [str(result.model.endog_names)]
    return res

显然，现在的结果包括t统计量，而不是标准误差：
print(results)

================================================
              Model    Model    Model    Model  
               (1)      (2)      (3)      (4)   
------------------------------------------------
cons         39.44*** 39.44*** 49.68*** 50.02***
             (24.44)  (24.32)  (7.85)   (7.80)  
displacement                            0.00    
                                        (0.44)  
length                         -0.10*   -0.09   
                               (-1.67)  (-1.63) 
price                 -0.00    -0.00    -0.00   
                      (-0.57)  (-1.03)  (-1.03) 
weight       -0.01*** -0.01*** -0.00*   -0.00*  
             (-11.60) (-9.42)  (-1.72)  (-1.67) 
N            74       74       74       74      
R2           0.65     0.65     0.67     0.67    
================================================
Standard errors in parentheses.
* p<.1, ** p<.05, ***p<.01

可以将括号中的标准错误更改为t统计，但前提是修改statsmodel
库中的文件summary2.py

只需将该文件中的函数\u col\u params（）
替换为以下版本：
def _col_params(result, float_format='%.4f', stars=True):
    '''Stack coefficients and standard errors in single column
    '''

    # Extract parameters
    res = summary_params(result)
    # Format float
    for col in res.columns[:3]:
        res[col] = res[col].apply(lambda x: float_format % x)
    # Std.Errors in parentheses
    res.ix[:, 2] = '(' + res.ix[:, 2] + ')'
    # Significance stars
    if stars:
        idx = res.ix[:, 3] < .1
        res.ix[idx, 0] = res.ix[idx, 0] + '*'
        idx = res.ix[:, 3] < .05
        res.ix[idx, 0] = res.ix[idx, 0] + '*'
        idx = res.ix[:, 3] < .01
        res.ix[idx, 0] = res.ix[idx, 0] + '*'
    # Stack Coefs and Std.Errors
    res = res.ix[:, [0,2]]
    res = res.stack()
    res = pd.DataFrame(res)
    res.columns = [str(result.model.endog_names)]
    return res

显然，现在的结果包括t统计量，而不是标准误差：
print(results)

================================================
              Model    Model    Model    Model  
               (1)      (2)      (3)      (4)   
------------------------------------------------
cons         39.44*** 39.44*** 49.68*** 50.02***
             (24.44)  (24.32)  (7.85)   (7.80)  
displacement                            0.00    
                                        (0.44)  
length                         -0.10*   -0.09   
                               (-1.67)  (-1.63) 
price                 -0.00    -0.00    -0.00   
                      (-0.57)  (-1.03)  (-1.03) 
weight       -0.01*** -0.01*** -0.00*   -0.00*  
             (-11.60) (-9.42)  (-1.72)  (-1.67) 
N            74       74       74       74      
R2           0.65     0.65     0.67     0.67    
================================================
Standard errors in parentheses.
* p<.1, ** p<.05, ***p<.01