Python中使用线性代数的线性回归_Python_Linear Regression_Linear Algebra

Python中使用线性代数的线性回归

python

Python中使用线性代数的线性回归,python,linear-regression,linear-algebra,Python,Linear Regression,Linear Algebra,我在维基百科上解释这些公式吗（） Python中的错误？下面是我试过的输出看起来是正确的，但数字有点偏离……比如小数点您可能希望也可能不希望+1出现在您所谓的delta中，这取决于X是否包含一个“常量”列（即所有值=1）否则，它看起来不错，如果有点非肾盂。我很想把它们写成： import numpy as np from numpy.linalg import inv from scipy.linalg import sqrtm def solve_theta(X, Y):

我在维基百科上解释这些公式吗（） Python中的错误？下面是我试过的

输出看起来是正确的，但数字有点偏离……比如小数点

您可能希望也可能不希望

+1

出现在您所谓的

delta

中，这取决于

是否包含一个“常量”列（即所有值=1）

否则，它看起来不错，如果有点非肾盂。我很想把它们写成：

import numpy as np
from numpy.linalg import inv
from scipy.linalg import sqrtm

def solve_theta(X, Y):
    return np.linalg.solve(X.T @ X, X.T @ Y)

def ss_res(X, Y, theta):
    res = Y - (X @ theta)
    return np.sum(res ** 2)

def std_error(X, Y, theta):
    nr, rank = X.shape
    resid_df = nr - rank
    residvar = ss_res(X, Y, theta) / resid_df
    var_theta = residvar * inv(X.T @ X)
    return np.diag(sqrtm(var_theta))[:,None]

注意：这使用

而不是写出

.dot（）

这种算法的数值稳定性并不惊人，您可能想看看使用SVD或QR分解。以下是您如何使用SVD的一个平易近人的描述：

John Mandel（1982）“回归分析中奇异值分解的使用”

我们可以通过创建一些虚拟数据来测试这一点：

np.random.seed(42)

N = 20
K = 3

true_theta = np.random.randn(K, 1) * 5
X = np.random.randn(N, K)
Y = np.random.randn(N, 1) + X @ true_theta

并在其上运行上述代码：

theta = solve_theta(X, Y)
sse = std_error(X, Y, theta)

print(np.column_stack((theta, sse)))

其中：

[[ 2.23556391  0.35678574]
 [-0.40643163  0.24751913]
 [ 3.14687637  0.26461827]]

                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1             2.2356      0.358      6.243      0.000       1.480       2.991
x2            -0.4064      0.248     -1.641      0.119      -0.929       0.116
x3             3.1469      0.266     11.812      0.000       2.585       3.709

我们可以使用

statsmodels

来测试这一点：

import statsmodels.api as sm

sm.OLS(Y, X).fit().summary()

其中：

[[ 2.23556391  0.35678574]
 [-0.40643163  0.24751913]
 [ 3.14687637  0.26461827]]

                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1             2.2356      0.358      6.243      0.000       1.480       2.991
x2            -0.4064      0.248     -1.641      0.119      -0.929       0.116
x3             3.1469      0.266     11.812      0.000       2.585       3.709

这非常接近。