Python 2.7 回归分析，使用statsmodels_Python 2.7_Regression_Finance_Statsmodels

Python 2.7 回归分析，使用statsmodels

python-2.7

Python 2.7 回归分析，使用statsmodels,python-2.7,regression,finance,statsmodels,Python 2.7,Regression,Finance,Statsmodels,请帮助我获取此代码的输出。为什么此代码的输出为nan？！！！我怎么了 import numpy as np import statsmodels.api as sm import statsmodels.formula.api as smf import pandas as pd import matplotlib.pyplot as plt import math import datetime as dt #importing Data es_url = 'https://www.stoxx

请帮助我获取此代码的输出。为什么此代码的输出为nan？！！！我怎么了

import numpy as np
import statsmodels.api as sm
import statsmodels.formula.api as smf
import pandas as pd
import matplotlib.pyplot as plt
import math
import datetime as dt
#importing Data
es_url = 'https://www.stoxx.com/document/Indices/Current/HistoricalData/hbrbcpe.txt'
vs_url = 'https://www.stoxx.com/document/Indices/Current/HistoricalData/h_vstoxx.txt'
#creating DataFrame
cols=['SX5P','SX5E','SXXP','SXXE','SXXF','SXXA','DK5f','DKXF']
es=pd.read_csv(es_url,index_col=0,parse_dates=True,sep=';',dayfirst=True,header=None,skiprows=4,names=cols)
vs=pd.read_csv(vs_url,index_col=0,header=2,parse_dates=True,sep=',',dayfirst=True)
data=pd.DataFrame({'EUROSTOXX' : es['SX5E'][es.index > dt.datetime(1999,1,1)]},dtype=float)
data=data.join(pd.DataFrame({'VSTOXX' : vs['V2TX'][vs.index > dt.datetime(1999,1,1)]},dtype=float))
data=data.fillna(method='ffill')
rets=(((data/data.shift(1))-1)*100).round(2)
xdat = rets['EUROSTOXX']
ydat = rets['VSTOXX']
#regression analysis
model = smf.ols('ydat ~ xdat',data=rets).fit()
print model.summary()

问题是，当您计算

rets

时，被零除会导致

inf

。此外，当使用shift时，会出现

NaN

s，因此在继续进行回归之前，需要先以某种方式处理缺少的值

使用您的数据浏览此示例，并查看：

df = data.loc['2016-03-20':'2016-04-01'].copy()

df看起来像：

            EUROSTOXX   VSTOXX
2016-03-21    3048.77  35.6846
2016-03-22    3051.23  35.6846
2016-03-23    3042.42  35.6846
2016-03-24    2986.73  35.6846
2016-03-25       0.00  35.6846
2016-03-28       0.00  35.6846
2016-03-29    3004.87  35.6846
2016-03-30    3044.10  35.6846
2016-03-31    3004.93  35.6846
2016-04-01    2953.28  35.6846

按1移位并除以：

df = (((df/df.shift(1))-1)*100).round(2)

打印出：

             EUROSTOXX  VSTOXX
2016-03-21         NaN     NaN
2016-03-22    0.080688     0.0
2016-03-23   -0.288736     0.0
2016-03-24   -1.830451     0.0
2016-03-25 -100.000000     0.0
2016-03-28         NaN     0.0
2016-03-29         inf     0.0
2016-03-30    1.305547     0.0
2016-03-31   -1.286751     0.0
2016-04-01   -1.718842     0.0

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                   ydat   R-squared:                       0.259
Model:                            OLS   Adj. R-squared:                  0.259
Method:                 Least Squares   F-statistic:                     1593.
Date:                Wed, 03 Jan 2018   Prob (F-statistic):          5.76e-299
Time:                        12:01:14   Log-Likelihood:                -13856.
No. Observations:                4554   AIC:                         2.772e+04
Df Residuals:                    4552   BIC:                         2.773e+04
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.1608      0.075      2.139      0.033       0.013       0.308
xdat          -1.4209      0.036    -39.912      0.000      -1.491      -1.351
==============================================================================
Omnibus:                     4280.114   Durbin-Watson:                   2.074
Prob(Omnibus):                  0.000   Jarque-Bera (JB):          4021394.925
Skew:                          -3.446   Prob(JB):                         0.00
Kurtosis:                     148.415   Cond. No.                         2.11
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

外卖：按1自动移动总是在顶部创建一个NaN。将0.00除以0.00生成

inf

处理缺失值的一种可能解决方案：

...
xdat = rets['EUROSTOXX']
ydat = rets['VSTOXX']

# handle missing values
messed_up_indices = xdat[xdat.isin([-np.inf, np.inf, np.nan]) == True].index
xdat[messed_up_indices] = xdat[messed_up_indices].replace([-np.inf, np.inf], np.nan)
xdat[messed_up_indices] = xdat[messed_up_indices].fillna(xdat.mean())
ydat[messed_up_indices] = ydat[messed_up_indices].fillna(0.0)

#regression analysis
model = smf.ols('ydat ~ xdat',data=rets, missing='raise').fit()
print(model.summary())

请注意，我将

missing='raise'

参数添加到ols以查看发生了什么

最终结果打印出来：

             EUROSTOXX  VSTOXX
2016-03-21         NaN     NaN
2016-03-22    0.080688     0.0
2016-03-23   -0.288736     0.0
2016-03-24   -1.830451     0.0
2016-03-25 -100.000000     0.0
2016-03-28         NaN     0.0
2016-03-29         inf     0.0
2016-03-30    1.305547     0.0
2016-03-31   -1.286751     0.0
2016-04-01   -1.718842     0.0

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                   ydat   R-squared:                       0.259
Model:                            OLS   Adj. R-squared:                  0.259
Method:                 Least Squares   F-statistic:                     1593.
Date:                Wed, 03 Jan 2018   Prob (F-statistic):          5.76e-299
Time:                        12:01:14   Log-Likelihood:                -13856.
No. Observations:                4554   AIC:                         2.772e+04
Df Residuals:                    4552   BIC:                         2.773e+04
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.1608      0.075      2.139      0.033       0.013       0.308
xdat          -1.4209      0.036    -39.912      0.000      -1.491      -1.351
==============================================================================
Omnibus:                     4280.114   Durbin-Watson:                   2.074
Prob(Omnibus):                  0.000   Jarque-Bera (JB):          4021394.925
Skew:                          -3.446   Prob(JB):                         0.00
Kurtosis:                     148.415   Cond. No.                         2.11
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.