Python 为什么sklearn中的SGDRegressor函数可以'；t收敛到正确的最优解？_Python_Numpy_Machine Learning_Scikit Learn_Sgd

Python 为什么sklearn中的SGDRegressor函数可以'；t收敛到正确的最优解？

python numpy machine-learning scikit-learn

Python 为什么sklearn中的SGDRegressor函数可以'；t收敛到正确的最优解？,python,numpy,machine-learning,scikit-learn,sgd,Python,Numpy,Machine Learning,Scikit Learn,Sgd,我在sklearn中练习使用SGDRegressor，但遇到了一些问题，我将其简化为以下代码 import numpy as np from sklearn.linear_model import SGDRegressor X = np.array([0,0.5,1]).reshape((3,1)) y = np.array([0,0.5,1]).reshape((3,1)) sgd = SGDRegressor() sgd.fit(X, y.ravel()) print("

我在sklearn中练习使用SGDRegressor，但遇到了一些问题，我将其简化为以下代码

import numpy as np
from sklearn.linear_model import SGDRegressor

X = np.array([0,0.5,1]).reshape((3,1))
y = np.array([0,0.5,1]).reshape((3,1))

sgd = SGDRegressor()  
sgd.fit(X, y.ravel())

print("intercept=", sgd.intercept_)
print("coef=", sgd.coef_)

这是输出：

intercept= [0.19835632]
coef= [0.18652387]

所有的输出都在intercept=0.19和coef=0.18左右，但显然正确的答案是

intercept=0

和

coef=1

。即使在这个简单的例子中，程序也无法得到参数的正确解。我不知道我在哪里犯了错误。

SGD（随机梯度下降）用于大规模数据。对于如此微不足道的数量，我建议您使用简单的线性回归。正如“无免费午餐定理”所述，没有一个模型适用于所有解决方案，因此您应该经常使用不同的模型进行试验，以找到最佳的解决方案（但是您还应该了解数据的背景，例如分布类型、多样性因子、偏度等）。请查看以下模型：

from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(X,y.ravel())
lr.predict([[0],[0.5],[1]])
# output -> array([1.11022302e-16, 5.00000000e-01, 1.00000000e+00])

当n=10000个数据点时（从3个原始点抽取替换样本），您可以使用SGD获得以下结果

n = 10000

X = np.random.choice([0,0.5,1], n, replace=True)
y = X

X = X.reshape((n,1))

sgd = SGDRegressor(verbose=1)  
sgd.fit(X, y)

# -- Epoch 1
# Norm: 0.86, NNZs: 1, Bias: 0.076159, T: 10000, Avg. loss: 0.012120
# Total training time: 0.04 seconds.
# -- Epoch 2
# Norm: 0.96, NNZs: 1, Bias: 0.024337, T: 20000, Avg. loss: 0.000586
# Total training time: 0.04 seconds.
# -- Epoch 3
# Norm: 0.98, NNZs: 1, Bias: 0.008826, T: 30000, Avg. loss: 0.000065
# Total training time: 0.04 seconds.
# -- Epoch 4
# Norm: 0.99, NNZs: 1, Bias: 0.003617, T: 40000, Avg. loss: 0.000010
# Total training time: 0.04 seconds.
# -- Epoch 5
# Norm: 1.00, NNZs: 1, Bias: 0.001686, T: 50000, Avg. loss: 0.000002
# Total training time: 0.05 seconds.
# -- Epoch 6
# Norm: 1.00, NNZs: 1, Bias: 0.000911, T: 60000, Avg. loss: 0.000000
# Total training time: 0.05 seconds.
# -- Epoch 7
# Norm: 1.00, NNZs: 1, Bias: 0.000570, T: 70000, Avg. loss: 0.000000
# Total training time: 0.05 seconds.
# Convergence after 7 epochs took 0.05 seconds

print("intercept=", sgd.intercept_)
print("coef=", sgd.coef_)
# intercept= [0.00057032]
# coef= [0.99892893]

plt.plot(X, y, 'r.')
plt.plot(X, sgd.intercept_ + sgd.coef_*X, 'b-')

下面的动画显示了在上述代码中，当n上升时，SGD Regressionor如何开始收敛到正确的最优值：

我试图将SGD应用于我的5000个数据集，但它也不起作用。也许5000也太小了？这不是谎言，但可能50k数据集应该是您应该使用它的最低数值。检查“随机”和“批量”梯度下降之间的差异