Python STATSSM模型实施-SEs计算

Python STATSSM模型实施-SEs计算,python,statistics,Python,Statistics,我正在试图找出statsmodels.sm.api是如何计算标准误差的: import pandas as pd import statsmodels.api as sm import numpy as np data = pd.read_csv("Advertising.csv", index_col=0) X = sm.add_constant(data[['TV', 'radio' ,'newspaper']]) y = data["sales"] model = sm.OLS(y, X

我正在试图找出statsmodels.sm.api是如何计算标准误差的:

import pandas as pd
import statsmodels.api as sm
import numpy as np

data = pd.read_csv("Advertising.csv", index_col=0)
X = sm.add_constant(data[['TV', 'radio' ,'newspaper']])
y = data["sales"]
model = sm.OLS(y, X).fit()
y_hat = np.dot(X, model.params)
residuals = y - y_hat
var = (np.sum(residuals**2))/(200-3-1)
根据我的理解,这个方程给出了标准误差矩阵,其中对角线是每个参数的标准误差:

np.sqrt(var * (np.dot(X.T, X)**-1))

array([[0.11918358, 0.00982868, 0.02471008, 0.02156167],
   [0.00982868, 0.00070041, 0.00201736, 0.00175762],
   [0.02471008, 0.00201736, 0.00432171, 0.00415011],
   [0.02156167, 0.00175762, 0.00415011, 0.0031791 ]])
但根据模型,汇总标准误差与上述不同:

"""
                            OLS Regression Results
==============================================================================
Dep. Variable:                  sales   R-squared:                       0.897
Model:                            OLS   Adj. R-squared:                  0.896
Method:                 Least Squares   F-statistic:                     570.3
Date:                Sun, 10 Nov 2019   Prob (F-statistic):           1.58e-96
Time:                        08:29:40   Log-Likelihood:                -386.18
No. Observations:                 200   AIC:                             780.4
Df Residuals:                     196   BIC:                             793.6
Df Model:                           3
Covariance Type:            nonrobust
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          2.9389      0.312      9.422      0.000       2.324       3.554
TV             0.0458      0.001     32.809      0.000       0.043       0.049
radio          0.1885      0.009     21.893      0.000       0.172       0.206
newspaper     -0.0010      0.006     -0.177      0.860      -0.013       0.011
==============================================================================
Omnibus:                       60.414   Durbin-Watson:                   2.084
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              151.241
Skew:                          -1.327   Prob(JB):                     1.44e-33
Kurtosis:                       6.332   Cond. No.                         454.
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
"""

这些SEs是如何计算的?

计算看起来是正确的,但是numpy数组上的操作大多是元素操作

如果xtx=X.T.dotX,那么xtx**-1是numpy中的元素逆。我们需要使用numpy或scipy中的linalg函数进行矩阵求逆,即np.linalg.invxtx

statsmodels中OLS中的计算与此不同,默认情况下使用基于SVD分解的Moore-Penrose伪逆pinv,或者可选地,计算基于QR分解


这两种分解都应用于设计矩阵exog,它比使用矩矩阵invxtx的矩阵逆具有更好的数值精度。然而,前者通常比后者慢,以速度换取精度。

xtx**-1在numpy中是元素逆。对于矩阵,请尝试np.linalg.invxtxinverse@Josef你是对的。请把它作为一个答案,如果你愿意,我会把它标记为这个。非常感谢!