什么决定了我的Python梯度下降算法是否收敛？_Python_Algorithm_Linear Regression_Gradient Descent_Convergence

什么决定了我的Python梯度下降算法是否收敛？

python algorithm

什么决定了我的Python梯度下降算法是否收敛？,python,algorithm,linear-regression,gradient-descent,convergence,Python,Algorithm,Linear Regression,Gradient Descent,Convergence,我在Python中实现了一个单变量线性回归模型，它使用梯度下降法来找到最佳拟合线的截距和斜率（我使用梯度下降法，而不是直接计算截距和斜率的最佳值，因为我最终希望推广到多元回归）我使用的数据如下sales是因变量（以美元为单位），而temp是自变量（摄氏度）（想想冰淇淋销售额与温度，或者类似的东西）这是我标准化后的数据： sales temp 0.06993007 0.174242424 0.326340326 0.340909091 0 0 0.3426

我在Python中实现了一个单变量线性回归模型，它使用梯度下降法来找到最佳拟合线的截距和斜率（我使用梯度下降法，而不是直接计算截距和斜率的最佳值，因为我最终希望推广到多元回归）

我使用的数据如下

sales

是因变量（以美元为单位），而

temp

是自变量（摄氏度）（想想冰淇淋销售额与温度，或者类似的东西）

这是我标准化后的数据：

sales        temp 
0.06993007  0.174242424
0.326340326 0.340909091
0           0
0.342657343 0.25
0.515151515 0.5
0.785547786 0.772727273
0.529137529 0.568181818
1           1
0.836829837 0.871212121
0.55011655  0.46969697
0.606060606 0.810606061
0.51981352  0.401515152

我的算法代码：

import numpy as np
import pandas as pd
from scipy import stats

class SLRegression(object):
    def __init__(self, learnrate = .01, tolerance = .000000001, max_iter = 10000):

        # Initialize learnrate, tolerance, and max_iter.
        self.learnrate = learnrate
        self.tolerance = tolerance
        self.max_iter = max_iter

    # Define the gradient descent algorithm.
    def fit(self, data):
        # data   :   array-like, shape = [m_observations, 2_columns] 

        # Initialize local variables.
        converged = False
        m = data.shape[0]

        # Track number of iterations.
        self.iter_ = 0

        # Initialize theta0 and theta1.
        self.theta0_ = 0
        self.theta1_ = 0

        # Compute the cost function.
        J = (1.0/(2.0*m)) * sum([(self.theta0_ + self.theta1_*data[i][1] - data[i][0])**2 for i in range(m)])
        print('J is: ', J)

        # Iterate over each point in data and update theta0 and theta1 on each pass.
        while not converged:
            diftemp0 = (1.0/m) * sum([(self.theta0_ + self.theta1_*data[i][1] - data[i][0]) for i in range(m)])
            diftemp1 = (1.0/m) * sum([(self.theta0_ + self.theta1_*data[i][1] - data[i][0]) * data[i][1] for i in range(m)])

            # Subtract the learnrate * partial derivative from theta0 and theta1.
            temp0 = self.theta0_ - (self.learnrate * diftemp0)
            temp1 = self.theta1_ - (self.learnrate * diftemp1)

            # Update theta0 and theta1.
            self.theta0_ = temp0
            self.theta1_ = temp1

            # Compute the updated cost function, given new theta0 and theta1.
            new_J = (1.0/(2.0*m)) * sum([(self.theta0_ + self.theta1_*data[i][1] - data[i][0])**2 for i in range(m)])
            print('New J is: %s') % (new_J)

            # Test for convergence.
            if abs(J - new_J) <= self.tolerance:
                converged = True
                print('Model converged after %s iterations!') % (self.iter_)

            # Set old cost equal to new cost and update iter.
            J = new_J
            self.iter_ += 1

            # Test whether we have hit max_iter.
            if self.iter_ == self.max_iter:
                converged = True
                print('Maximum iterations have been reached!')

        return self

    def point_forecast(self, x):
        # Given feature value x, returns the regression's predicted value for y.
        return self.theta0_ + self.theta1_ * x


# Run the algorithm on a data set.
if __name__ == '__main__':
    # Load in the .csv file.
    data = np.squeeze(np.array(pd.read_csv('sales_normalized.csv')))

    # Create a regression model with the default learning rate, tolerance, and maximum number of iterations.
    slregression = SLRegression()

    # Call the fit function and pass in the data.
    slregression.fit(data)

    # Print out the results.
    print('After %s iterations, the model converged on Theta0 = %s and Theta1 = %s.') % (slregression.iter_, slregression.theta0_, slregression.theta1_)
    # Compare our model to scipy linregress model.
    slope, intercept, r_value, p_value, slope_std_error = stats.linregress(data[:,1], data[:,0])
    print('Scipy linear regression gives intercept: %s and slope = %s.') % (intercept, slope)

    # Test the model with a point forecast.
    print('As an example, our algorithm gives y = %s given x = .87.') % (slregression.point_forecast(.87)) # Should be about .83.
    print('The true y-value for x = .87 is about .8368.')

将numpy导入为np
作为pd进口熊猫
从scipy导入统计信息
类（对象）：
定义初始值（self，learnrate=.01，容差=.00000000 1，最大值=10000）：
#初始化learnrate、tolerance和max_iter。
self.learnrate=learnrate
自我宽容=宽容
self.max\u iter=max\u iter
#定义梯度下降算法。
def配合（自身、数据）：
#数据：数组状，形状=[m_观测值，2_列]
#初始化局部变量。
收敛=错误
m=数据。形状[0]
#跟踪迭代次数。
self.iter=0
#初始化θ0和θ1。
self.theta0=0
self.theta1=0
#计算成本函数。
J=（1.0/（2.0*m））*和（[（自θ0+自θ1数据[i][1]-数据[i][0]）**2表示范围（m）内的i]）
打印（'J是：'，J）
#迭代数据中的每个点，并在每次传递时更新θ0和θ1。
虽然没有融合：
diftemp0=（1.0/m）*和（[（自θ0+自θ1数据[i][1]-数据[i][0]），对于范围（m）内的i）
diftemp1=（1.0/m）*和（[（自θ0+自θ1数据[i][1]-数据[i][0]）*范围（m）内i的数据[i][1]）
#从θ0和θ1中减去learnrate*偏导数。
temp0=self.theta0-（self.learnrate*diftemp0）
temp1=self.theta1-（self.learnrate*diftemp1）
#更新θ0和θ1。
self.theta0=temp0
self.theta1=temp1
#计算更新后的成本函数，给出新的θ0和θ1。
新的θJ=（1.0/（2.0*m））*和（[（自θ0+自θ1*data[i][1]-数据[i][0]）**2，用于范围（m）内的i）
打印（'New J是：%s'）%（New_J）
#收敛性检验。
如果防抱死制动系统（J-新）
给定learnrate=.01，tolerance=.0000000001，max_iter=10000，结合规范化数据，我可以得到梯度下降算法收敛。然而，当我使用未规范化的数据时，在算法不返回NaN的情况下，我能获得的最小学习率是.005
这是你设定算法的方式
数据的标准化使得最佳拟合的y截距约为0.0。否则，你可能会有一个y-截距数千个单位的开始猜测，你将不得不跋涉到那里之前，你真的开始了优化部分
这种类型的算法是否需要规范化的数据？如果需要，原因是什么
不，绝对不是，但是如果你不规范化，你应该更明智地选择一个起点（你从（m，b）=（0，0）开始）。如果不规范化数据，您的learnrate也可能太小，您的容差也是如此
此外，如果算法需要标准化值，那么采用非标准化形式的新x值并将其插入点预测的最佳方法是什么
应用对原始数据应用的任何转换，以将规范化数据转换为新的x值。（规范化的代码超出了您所显示的范围）。如果此测试点在原始数据的（minx，maxx）范围内，一旦转换，它应该在0范围内。您正在测试的绝对差异如下：If abs（J-new_J）注意线性具有精确的最佳解（如scipy.stats.linregresse；它没有公差参数）。类似于一般最小二乘法的方法确实包含公差参数，这些参数是相对的：例如ftol和xtol。您可以在不使用梯度下降的情况下求解多元回归。最简单的方法可能是使用高斯消去法来解决。请参阅以获得比此更准确的方法-这是统计数据包中的某些内容应该使用的方法。谢谢@Evert。从理论上讲，如果new_J
每次都变小，那么abs的差异应该与relative相同，不是吗？但很明显，那样的话我就不需要了。。我想这可能是我尝试实现梯度下降的另一种方式留下的。@mcdowella，谢谢你提供的信息。我会仔细阅读这些链接。这很有帮助。找到感兴趣的区域和实际会聚之间的权衡是有趣的。我想既然导数在接近最小值时变小了，你就不需要调整学习率了，但我需要进一步考虑这个问题，并了解它是如何工作的。谢谢你的建议。跟踪θ也是一个好主意。另外，这里有一个我提到的“动态步长”方法的例子。
import numpy as np
import pandas as pd
from scipy import stats

class SLRegression(object):
    def __init__(self, learnrate = .01, tolerance = .000000001, max_iter = 10000):

        # Initialize learnrate, tolerance, and max_iter.
        self.learnrate = learnrate
        self.tolerance = tolerance
        self.max_iter = max_iter

    # Define the gradient descent algorithm.
    def fit(self, data):
        # data   :   array-like, shape = [m_observations, 2_columns] 

        # Initialize local variables.
        converged = False
        m = data.shape[0]

        # Track number of iterations.
        self.iter_ = 0

        # Initialize theta0 and theta1.
        self.theta0_ = 0
        self.theta1_ = 0

        # Compute the cost function.
        J = (1.0/(2.0*m)) * sum([(self.theta0_ + self.theta1_*data[i][1] - data[i][0])**2 for i in range(m)])
        print('J is: ', J)

        # Iterate over each point in data and update theta0 and theta1 on each pass.
        while not converged:
            diftemp0 = (1.0/m) * sum([(self.theta0_ + self.theta1_*data[i][1] - data[i][0]) for i in range(m)])
            diftemp1 = (1.0/m) * sum([(self.theta0_ + self.theta1_*data[i][1] - data[i][0]) * data[i][1] for i in range(m)])

            # Subtract the learnrate * partial derivative from theta0 and theta1.
            temp0 = self.theta0_ - (self.learnrate * diftemp0)
            temp1 = self.theta1_ - (self.learnrate * diftemp1)

            # Update theta0 and theta1.
            self.theta0_ = temp0
            self.theta1_ = temp1

            # Compute the updated cost function, given new theta0 and theta1.
            new_J = (1.0/(2.0*m)) * sum([(self.theta0_ + self.theta1_*data[i][1] - data[i][0])**2 for i in range(m)])
            print('New J is: %s') % (new_J)

            # Test for convergence.
            if abs(J - new_J) <= self.tolerance:
                converged = True
                print('Model converged after %s iterations!') % (self.iter_)

            # Set old cost equal to new cost and update iter.
            J = new_J
            self.iter_ += 1

            # Test whether we have hit max_iter.
            if self.iter_ == self.max_iter:
                converged = True
                print('Maximum iterations have been reached!')

        return self

    def point_forecast(self, x):
        # Given feature value x, returns the regression's predicted value for y.
        return self.theta0_ + self.theta1_ * x


# Run the algorithm on a data set.
if __name__ == '__main__':
    # Load in the .csv file.
    data = np.squeeze(np.array(pd.read_csv('sales_normalized.csv')))

    # Create a regression model with the default learning rate, tolerance, and maximum number of iterations.
    slregression = SLRegression()

    # Call the fit function and pass in the data.
    slregression.fit(data)

    # Print out the results.
    print('After %s iterations, the model converged on Theta0 = %s and Theta1 = %s.') % (slregression.iter_, slregression.theta0_, slregression.theta1_)
    # Compare our model to scipy linregress model.
    slope, intercept, r_value, p_value, slope_std_error = stats.linregress(data[:,1], data[:,0])
    print('Scipy linear regression gives intercept: %s and slope = %s.') % (intercept, slope)

    # Test the model with a point forecast.
    print('As an example, our algorithm gives y = %s given x = .87.') % (slregression.point_forecast(.87)) # Should be about .83.
    print('The true y-value for x = .87 is about .8368.')