Python中对数递减曲线上的梯度下降_Python_Numpy_Machine Learning_Integral_Gradient Descent

Python中对数递减曲线上的梯度下降

python numpy machine-learning

Python中对数递减曲线上的梯度下降,python,numpy,machine-learning,integral,gradient-descent,Python,Numpy,Machine Learning,Integral,Gradient Descent,我希望在对数递减曲线上运行梯度下降，如下所示： y=y0-a*ln（b+x）本例中我的y0为：800 我试着用a和b的偏导数来做这件事，但这显然最小化了平方误差，但它不收敛。我知道这不是矢量化的，我可能采取了完全错误的方法。我是犯了什么简单的错误，还是在这个问题上一错再错 import numpy as np # constants my gradient descent model should find: a = 4 b = 4 # function to fit on! def fu

我希望在对数递减曲线上运行梯度下降，如下所示：

y=y0-a*ln（b+x）

本例中我的y0为：800

我试着用a和b的偏导数来做这件事，但这显然最小化了平方误差，但它不收敛。我知道这不是矢量化的，我可能采取了完全错误的方法。我是犯了什么简单的错误，还是在这个问题上一错再错

import numpy as np

# constants my gradient descent model should find:
a = 4
b = 4

# function to fit on!
def function(x, a, b):
    y0 = 800
    return y0 - a * np.log(b + x)

# Generates data
def gen_data(numpoints):
    a = 4
    b = 4
    x = np.array(range(0, numpoints))
    y = function(x, a, b)
    return x, y
x, y = gen_data(600)

def grad_model(x, y, iterations):
    converged = False

    # length of dataset
    m = len(x)

    # guess   a ,  b
    theta = [0.1, 0.1]
    alpha = 0.001

    # initial error
    e = np.sum((np.square(function(x, theta[0], theta[1])) - y))

    for iteration in range(iterations):
        hypothesis = function(x, theta[0], theta[1])
        loss = hypothesis - y

        # compute partial deritaves to find slope to "fall" into
        theta0_grad = (np.mean(np.sum(-np.log(x + y)))) / (m)
        theta1_grad = (np.mean((((np.log(theta[1] + x)) / theta[0]) - (x*(np.log(theta[1] + x)) / theta[0])))) / (2*m)

        theta0 = theta[0] - (alpha * theta0_grad)
        theta1 = theta[1] - (alpha * theta1_grad)

        theta[1] = theta1
        theta[0] = theta0

        new_e = np.sum(np.square((function(x, theta[0], theta[1])) - y))
        if new_e > e:
            print "AHHHH!"
            print "Iteration: "+ str(iteration)
            break
        print theta
    return theta[0], theta[1]

我在你的代码中发现了一些错误。线路

e = np.sum((np.square(function(x, theta[0], theta[1])) - y))

不正确，应替换为

e = np.sum((np.square(function(x, theta[0], theta[1]) - y)))

new_e的公式包含相同的错误

而且，梯度公式是错误的。你的损失函数是 $L（a，b）=\sum{i=1}^N y_0-a\log（b+x_i）$，所以你必须计算$L$对$a$和$b$的偏导数。（LaTeX真的对stackoverflow不起作用吗？）最后一点是梯度下降法有步长限制，所以我们的步长不能太大。下面是一个工作得更好的代码版本：

import numpy as np
import matplotlib.pyplot as plt

# constants my gradient descent model should find:
a = 4.0
b = 4.0
y0 = 800.0

# function to fit on!
def function(x, a, b):
    # y0 = 800
    return y0 - a * np.log(b + x)

# Generates data
def gen_data(numpoints):
    # a = 4
    # b = 4
    x = np.array(range(0, numpoints))
    y = function(x, a, b)
    return x, y
x, y = gen_data(600)

def grad_model(x, y, iterations):
    converged = False

    # length of dataset
    m = len(x)

    # guess   a ,  b
    theta = [0.1, 0.1]
    alpha = 0.00001

    # initial error
    # e = np.sum((np.square(function(x, theta[0], theta[1])) - y))    #  This was a bug
    e = np.sum((np.square(function(x, theta[0], theta[1]) - y)))

    costs = np.zeros(iterations)

    for iteration in range(iterations):
        hypothesis = function(x, theta[0], theta[1])
        loss = hypothesis - y

        # compute partial deritaves to find slope to "fall" into
        # theta0_grad = (np.mean(np.sum(-np.log(x + y)))) / (m)
        # theta1_grad = (np.mean((((np.log(theta[1] + x)) / theta[0]) - (x*(np.log(theta[1] + x)) / theta[0])))) / (2*m)
        theta0_grad = 2*np.sum((y0 - theta[0]*np.log(theta[1] + x) - y)*(-np.log(theta[1] + x)))
        theta1_grad = 2*np.sum((y0 - theta[0]*np.log(theta[1] + x) - y)*(-theta[0]/(b + x)))

        theta0 = theta[0] - (alpha * theta0_grad)
        theta1 = theta[1] - (alpha * theta1_grad)

        theta[1] = theta1
        theta[0] = theta0

        # new_e = np.sum(np.square((function(x, theta[0], theta[1])) - y)) # This was a bug
        new_e = np.sum(np.square((function(x, theta[0], theta[1]) - y)))
        costs[iteration] = new_e
        if new_e > e:
            print "AHHHH!"
            print "Iteration: "+ str(iteration)
            # break
        print theta
    return theta[0], theta[1], costs

(theta0,theta1,costs) = grad_model(x,y,100000)
plt.semilogy(costs)

我在你的代码中发现了一些错误。线路

e = np.sum((np.square(function(x, theta[0], theta[1])) - y))

不正确，应替换为

e = np.sum((np.square(function(x, theta[0], theta[1]) - y)))

new_e的公式包含相同的错误

import numpy as np
import matplotlib.pyplot as plt

# constants my gradient descent model should find:
a = 4.0
b = 4.0
y0 = 800.0

# function to fit on!
def function(x, a, b):
    # y0 = 800
    return y0 - a * np.log(b + x)

# Generates data
def gen_data(numpoints):
    # a = 4
    # b = 4
    x = np.array(range(0, numpoints))
    y = function(x, a, b)
    return x, y
x, y = gen_data(600)

def grad_model(x, y, iterations):
    converged = False

    # length of dataset
    m = len(x)

    # guess   a ,  b
    theta = [0.1, 0.1]
    alpha = 0.00001

    # initial error
    # e = np.sum((np.square(function(x, theta[0], theta[1])) - y))    #  This was a bug
    e = np.sum((np.square(function(x, theta[0], theta[1]) - y)))

    costs = np.zeros(iterations)

    for iteration in range(iterations):
        hypothesis = function(x, theta[0], theta[1])
        loss = hypothesis - y

        # compute partial deritaves to find slope to "fall" into
        # theta0_grad = (np.mean(np.sum(-np.log(x + y)))) / (m)
        # theta1_grad = (np.mean((((np.log(theta[1] + x)) / theta[0]) - (x*(np.log(theta[1] + x)) / theta[0])))) / (2*m)
        theta0_grad = 2*np.sum((y0 - theta[0]*np.log(theta[1] + x) - y)*(-np.log(theta[1] + x)))
        theta1_grad = 2*np.sum((y0 - theta[0]*np.log(theta[1] + x) - y)*(-theta[0]/(b + x)))

        theta0 = theta[0] - (alpha * theta0_grad)
        theta1 = theta[1] - (alpha * theta1_grad)

        theta[1] = theta1
        theta[0] = theta0

        # new_e = np.sum(np.square((function(x, theta[0], theta[1])) - y)) # This was a bug
        new_e = np.sum(np.square((function(x, theta[0], theta[1]) - y)))
        costs[iteration] = new_e
        if new_e > e:
            print "AHHHH!"
            print "Iteration: "+ str(iteration)
            # break
        print theta
    return theta[0], theta[1], costs

(theta0,theta1,costs) = grad_model(x,y,100000)
plt.semilogy(costs)

是的，每当我通过标准线性梯度下降法时，我都会遇到麻烦，我不太知道如何解决这个问题。我还没有读过代码，但是，你说它不收敛是什么意思？误差是否越来越大，从而导致偏离？或者只是需要太长时间才能收敛？假设你正确地对导数进行了编码，可能只是因为你选择了错误的

alpha

，或者梯度的方向翻转了符号（

，而不是

）。如果我的错误出现分歧，我在代码中加了一个中断。我相信θ[0]（a）变量的偏导数是正确的，但不是θ[1]（b）变量。它似乎收敛正确，但仅适用于θ[0]。是的，每当我通过标准线性梯度下降时，我都会遇到问题，并且不太知道如何解决这个问题。我还没有真正阅读代码，但是，你说它不收敛是什么意思？误差是否越来越大，从而导致偏离？或者只是需要太长时间才能收敛？假设你正确地对导数进行了编码，可能只是因为你选择了错误的

alpha

，或者梯度的方向翻转了符号（

，而不是

）。如果我的错误出现分歧，我在代码中加了一个中断。我相信θ[0]（a）变量的偏导数是正确的，但不是θ[1]（b）变量。它似乎收敛正确，但仅适用于θ[0]。非常感谢！工作起来很有魅力！关于如何找到正确的步长，有什么标准程序要遵循吗？非常感谢！工作起来很有魅力！如何找到正确的步长，需要遵循哪些标准程序？