Python 使用不符合成本函数正态方程的sklearn的多元线性回归_Python_Machine Learning_Scikit Learn_Linear Regression

Python 使用不符合成本函数正态方程的sklearn的多元线性回归

python machine-learning scikit-learn

Python 使用不符合成本函数正态方程的sklearn的多元线性回归,python,machine-learning,scikit-learn,linear-regression,Python,Machine Learning,Scikit Learn,Linear Regression,我必须使我的数据符合多元线性模型。但是sklearn.linear_模型产生的结果与正常方程预测的结果不同。以下是两者的代码： x=np.arange(12).reshape(3,4) y=np.arange(3,6).reshape(3,1) x=np.insert(x,0,1,axis=1) def normal(X,y): return np.dot(np.dot(linalg.pinv(np.dot(X.T,X)),X.T),y) norma

我必须使我的数据符合多元线性模型。但是sklearn.linear_模型产生的结果与正常方程预测的结果不同。以下是两者的代码：

   x=np.arange(12).reshape(3,4)
   y=np.arange(3,6).reshape(3,1)
   x=np.insert(x,0,1,axis=1)
   def normal(X,y):
       return np.dot(np.dot(linalg.pinv(np.dot(X.T,X)),X.T),y)

   normal(x,y)
   >>> [[ 0.4375 ]
       [-0.59375]
       [-0.15625]
       [ 0.28125]
       [ 0.71875]]
   from sklearn import linear_model
   reg=linear_model.LinearRegression()
   reg.fit(x,y)
   reg.coef_
   >>> [[ 0.    ,  0.0625,  0.0625,  0.0625,  0.0625]]

我的代码正确吗？

发生的事情是，在数据矩阵中包含了截取项。默认情况下，scikit learn的

LinearRegression

类会自动查找截距项，因此您无需在矩阵中插入1列：

from sklearn import linear_model
x=np.arange(12).reshape(3,4)
y=np.arange(3,6).reshape(3,1)    
reg=linear_model.LinearRegression()
reg.fit(x,y)

因此，我们得到了系数和截距项：

In [32]: reg.coef_
Out[32]: array([[ 0.0625,  0.0625,  0.0625,  0.0625]])

In [33]: reg.intercept_
Out[33]: array([ 2.625])

我们可以通过在矩阵的每一行和系数之间做点积，并在最后添加截距项，来验证我们是否得到了正确的输出

In [34]: x.dot(reg.coef_.T) + reg.intercept_
Out[34]:
array([[ 3.],
       [ 4.],
       [ 5.]])

现在，如果你想特别地匹配正规方程给你的，那很好，你可以插入一列。但是，您需要禁用查找截距，因为您已经手动插入了一个可以为您执行此操作的功能

因此：

x=np.arange(12).reshape(3,4)
y=np.arange(3,6).reshape(3,1)
x=np.insert(x,0,1,axis=1)
reg = linear_model.LinearRegression(fit_intercept=False)
reg.fit(x,y)

通过这样做，我们现在得到系数：

In [37]: reg.coef_
Out[37]: array([[ 0.4375 , -0.59375, -0.15625,  0.28125,  0.71875]])

这与正态方程的输出相匹配。

我认为

normal

函数不正确

np.linalg.pinv

返回其输入的伪逆，可以计算为

np.linalg.inv（X.T.dot（X））.dot（X.T）

。所以你们在做一些逆和伪逆的组合

normal

应

返回np.linalg.pinv（X）.dot（y）

。它包含在不可逆矩阵的情况下。它不会以任何方式影响答案