Python STATSSM模型实施-SEs计算
我正在试图找出statsmodels.sm.api是如何计算标准误差的:Python STATSSM模型实施-SEs计算,python,statistics,Python,Statistics,我正在试图找出statsmodels.sm.api是如何计算标准误差的: import pandas as pd import statsmodels.api as sm import numpy as np data = pd.read_csv("Advertising.csv", index_col=0) X = sm.add_constant(data[['TV', 'radio' ,'newspaper']]) y = data["sales"] model = sm.OLS(y, X
import pandas as pd
import statsmodels.api as sm
import numpy as np
data = pd.read_csv("Advertising.csv", index_col=0)
X = sm.add_constant(data[['TV', 'radio' ,'newspaper']])
y = data["sales"]
model = sm.OLS(y, X).fit()
y_hat = np.dot(X, model.params)
residuals = y - y_hat
var = (np.sum(residuals**2))/(200-3-1)
根据我的理解,这个方程给出了标准误差矩阵,其中对角线是每个参数的标准误差:
np.sqrt(var * (np.dot(X.T, X)**-1))
array([[0.11918358, 0.00982868, 0.02471008, 0.02156167],
[0.00982868, 0.00070041, 0.00201736, 0.00175762],
[0.02471008, 0.00201736, 0.00432171, 0.00415011],
[0.02156167, 0.00175762, 0.00415011, 0.0031791 ]])
但根据模型,汇总标准误差与上述不同:
"""
OLS Regression Results
==============================================================================
Dep. Variable: sales R-squared: 0.897
Model: OLS Adj. R-squared: 0.896
Method: Least Squares F-statistic: 570.3
Date: Sun, 10 Nov 2019 Prob (F-statistic): 1.58e-96
Time: 08:29:40 Log-Likelihood: -386.18
No. Observations: 200 AIC: 780.4
Df Residuals: 196 BIC: 793.6
Df Model: 3
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 2.9389 0.312 9.422 0.000 2.324 3.554
TV 0.0458 0.001 32.809 0.000 0.043 0.049
radio 0.1885 0.009 21.893 0.000 0.172 0.206
newspaper -0.0010 0.006 -0.177 0.860 -0.013 0.011
==============================================================================
Omnibus: 60.414 Durbin-Watson: 2.084
Prob(Omnibus): 0.000 Jarque-Bera (JB): 151.241
Skew: -1.327 Prob(JB): 1.44e-33
Kurtosis: 6.332 Cond. No. 454.
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
"""
这些SEs是如何计算的?计算看起来是正确的,但是numpy数组上的操作大多是元素操作 如果xtx=X.T.dotX,那么xtx**-1是numpy中的元素逆。我们需要使用numpy或scipy中的linalg函数进行矩阵求逆,即np.linalg.invxtx statsmodels中OLS中的计算与此不同,默认情况下使用基于SVD分解的Moore-Penrose伪逆pinv,或者可选地,计算基于QR分解
这两种分解都应用于设计矩阵exog,它比使用矩矩阵invxtx的矩阵逆具有更好的数值精度。然而,前者通常比后者慢,以速度换取精度。xtx**-1在numpy中是元素逆。对于矩阵,请尝试np.linalg.invxtxinverse@Josef你是对的。请把它作为一个答案,如果你愿意,我会把它标记为这个。非常感谢!