使用python和numpy的梯度下降_Python_Numpy_Machine Learning_Linear Regression_Gradient Descent

使用python和numpy的梯度下降

python numpy machine-learning

使用python和numpy的梯度下降,python,numpy,machine-learning,linear-regression,gradient-descent,Python,Numpy,Machine Learning,Linear Regression,Gradient Descent,上面代码中的θ是100.2100.2，但在matlab中应该是100.261.09，这是正确的。我认为您的代码有点太复杂了，需要更多的结构，否则您将迷失在所有方程和运算中。最终，此回归归结为四个操作：计算假设h=X*θ 计算损失=h-y，可能是成本的平方（损失^2）/2m 计算梯度=X'*损失/m 更新参数θ=θ-α*梯度在你的情况下，我猜你把m和n混淆了。此处m表示训练集中的示例数，而不是功能数让我们看看我对您的代码的修改： def gradient(X_norm,y,theta,alp

上面代码中的θ是

100.2100.2

，但在matlab中应该是

100.261.09

，这是正确的。

我认为您的代码有点太复杂了，需要更多的结构，否则您将迷失在所有方程和运算中。最终，此回归归结为四个操作：

计算假设h=X*θ

计算损失=h-y，可能是成本的平方（损失^2）/2m

计算梯度=X'*损失/m

更新参数θ=θ-α*梯度

在你的情况下，我猜你把

和

混淆了。此处

表示训练集中的示例数，而不是功能数

让我们看看我对您的代码的修改：

def gradient(X_norm,y,theta,alpha,m,n,num_it):
    temp=np.array(np.zeros_like(theta,float))
    for i in range(0,num_it):
        h=np.dot(X_norm,theta)
        #temp[j]=theta[j]-(alpha/m)*(  np.sum( (h-y)*X_norm[:,j][np.newaxis,:] )  )
        temp[0]=theta[0]-(alpha/m)*(np.sum(h-y))
        temp[1]=theta[1]-(alpha/m)*(np.sum((h-y)*X_norm[:,1]))
        theta=temp
    return theta



X_norm,mean,std=featureScale(X)
#length of X (number of rows)
m=len(X)
X_norm=np.array([np.ones(m),X_norm])
n,m=np.shape(X_norm)
num_it=1500
alpha=0.01
theta=np.zeros(n,float)[:,np.newaxis]
X_norm=X_norm.transpose()
theta=gradient(X_norm,y,theta,alpha,m,n,num_it)
print theta

首先，我创建了一个小的随机数据集，如下所示：

import numpy as np
import random

# m denotes the number of examples here, not the number of features
def gradientDescent(x, y, theta, alpha, m, numIterations):
    xTrans = x.transpose()
    for i in range(0, numIterations):
        hypothesis = np.dot(x, theta)
        loss = hypothesis - y
        # avg cost per example (the 2 in 2*m doesn't really matter here.
        # But to be consistent with the gradient, I include it)
        cost = np.sum(loss ** 2) / (2 * m)
        print("Iteration %d | Cost: %f" % (i, cost))
        # avg gradient per example
        gradient = np.dot(xTrans, loss) / m
        # update
        theta = theta - alpha * gradient
    return theta


def genData(numPoints, bias, variance):
    x = np.zeros(shape=(numPoints, 2))
    y = np.zeros(shape=numPoints)
    # basically a straight line
    for i in range(0, numPoints):
        # bias feature
        x[i][0] = 1
        x[i][1] = i
        # our target variable
        y[i] = (i + bias) + random.uniform(0, 1) * variance
    return x, y

# gen 100 points with a bias of 25 and 10 variance as a bit of noise
x, y = genData(100, 25, 10)
m, n = np.shape(x)
numIterations= 100000
alpha = 0.0005
theta = np.ones(n)
theta = gradientDescent(x, y, theta, alpha, m, numIterations)
print(theta)

如您所见，我还添加了由excel计算的生成的回归线和公式

您需要注意使用梯度下降回归的直觉。在对数据X进行完整的批量传递时，需要将每个示例的m损失减少为一次重量更新。在这种情况下，这是梯度总和的平均值，因此除以

接下来需要注意的是跟踪收敛并调整学习速度。因此，您应该始终跟踪每次迭代的成本，甚至绘制它

如果运行我的示例，返回的θ将如下所示：

import numpy as np
import random

# m denotes the number of examples here, not the number of features
def gradientDescent(x, y, theta, alpha, m, numIterations):
    xTrans = x.transpose()
    for i in range(0, numIterations):
        hypothesis = np.dot(x, theta)
        loss = hypothesis - y
        # avg cost per example (the 2 in 2*m doesn't really matter here.
        # But to be consistent with the gradient, I include it)
        cost = np.sum(loss ** 2) / (2 * m)
        print("Iteration %d | Cost: %f" % (i, cost))
        # avg gradient per example
        gradient = np.dot(xTrans, loss) / m
        # update
        theta = theta - alpha * gradient
    return theta


def genData(numPoints, bias, variance):
    x = np.zeros(shape=(numPoints, 2))
    y = np.zeros(shape=numPoints)
    # basically a straight line
    for i in range(0, numPoints):
        # bias feature
        x[i][0] = 1
        x[i][1] = i
        # our target variable
        y[i] = (i + bias) + random.uniform(0, 1) * variance
    return x, y

# gen 100 points with a bias of 25 and 10 variance as a bit of noise
x, y = genData(100, 25, 10)
m, n = np.shape(x)
numIterations= 100000
alpha = 0.0005
theta = np.ones(n)
theta = gradientDescent(x, y, theta, alpha, m, numIterations)
print(theta)

这实际上非常接近excel计算的方程（y=x+30）。请注意，当我们将偏差传递到第一列时，第一个θ值表示偏差权重。

下面您可以找到我对线性回归问题的梯度下降的实现

首先，计算梯度，比如

X.T*（X*w-y）/N

，同时用这个梯度更新当前θ

X：特征矩阵
y：目标值
w：权重/值
N：训练集的大小

以下是python代码：

Iteration 99997 | Cost: 47883.706462
Iteration 99998 | Cost: 47883.706462
Iteration 99999 | Cost: 47883.706462
[ 29.25567368   1.01108458]

我知道这个问题已经得到了回答，但我已经对GD功能进行了一些更新：

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import random

def generateSample(N, variance=100):
    X = np.matrix(range(N)).T + 1
    Y = np.matrix([random.random() * variance + i * 10 + 900 for i in range(len(X))]).T
    return X, Y

def fitModel_gradient(x, y):
    N = len(x)
    w = np.zeros((x.shape[1], 1))
    eta = 0.0001

    maxIteration = 100000
    for i in range(maxIteration):
        error = x * w - y
        gradient = x.T * error / N
        w = w - eta * gradient
    return w

def plotModel(x, y, w):
    plt.plot(x[:,1], y, "x")
    plt.plot(x[:,1], x * w, "r-")
    plt.show()

def test(N, variance, modelFunction):
    X, Y = generateSample(N, variance)
    X = np.hstack([np.matrix(np.ones(len(X))).T, X])
    w = modelFunction(X, Y)
    plotModel(X, Y, w)


test(50, 600, fitModel_gradient)
test(50, 1000, fitModel_gradient)
test(100, 200, fitModel_gradient)

这个函数在迭代过程中减少了alpha值，使函数收敛得更快参见R中的一个示例。我在Python中应用了相同的逻辑。

继Python中的@thomas jungblut实现之后，我在Octave中也采用了相同的逻辑。如果您发现问题，请让我知道，我将修复+更新

数据来自具有以下行的txt文件：

  ### COST FUNCTION

def cost(theta,X,y):
     ### Evaluate half MSE (Mean square error)
     m = len(y)
     error = np.dot(X,theta) - y
     J = np.sum(error ** 2)/(2*m)
     return J

 cost(theta,X,y)



def GD(X,y,theta,alpha):

    cost_histo = [0]
    theta_histo = [0]

    # an arbitrary gradient, to pass the initial while() check
    delta = [np.repeat(1,len(X))]
    # Initial theta
    old_cost = cost(theta,X,y)

    while (np.max(np.abs(delta)) > 1e-6):
        error = np.dot(X,theta) - y
        delta = np.dot(np.transpose(X),error)/len(y)
        trial_theta = theta - alpha * delta
        trial_cost = cost(trial_theta,X,y)
        while (trial_cost >= old_cost):
            trial_theta = (theta +trial_theta)/2
            trial_cost = cost(trial_theta,X,y)
            cost_histo = cost_histo + trial_cost
            theta_histo = theta_histo +  trial_theta
        old_cost = trial_cost
        theta = trial_theta
    Intercept = theta[0] 
    Slope = theta[1]  
    return [Intercept,Slope]

res = GD(X,y,theta,alpha)

把它看作是我们想要预测的特征[卧室数量][mts2]和最后一列[租金价格]的一个非常粗略的样本

以下是倍频程实现：

分号在python和缩进（如果是基本的）中被忽略。在gradientDescent中，

/2*m

应该是

/（2*m）

？使用

损失

作为绝对差异不是一个好主意，因为“损失”通常是“成本”的同义词。你也不需要传递

，NumPy数组知道它们自己的形状。有人能解释一下代价函数的偏导数如何等于函数：np.dot（xTrans，loss）/m吗？@Saurabh Verma：在我解释细节之前，首先，这句话：np.dot（xTrans，loss）/m是矩阵计算，同时计算一行中所有训练数据对、标签的梯度。结果是一个大小为（m×1）的向量。回到基本原理，如果我们对θ[j]取一个平方误差的偏导数，我们将取这个函数的导数：（np.dot（x[i]，θ）-y[i]）**2w.r.t.θ[j]。注意，θ是一个向量。结果应该是2*（np.dot（x[i]，θ）-y[i]）*x[j]。您可以手动确认这一点。与不必要地复制数据的xtrans=x.transpose（）不同，每次使用xtrans时，您都可以使用x.T。为了有效地访问内存，只需要对x进行Fortran排序pd@Muatik我不明白如何得到梯度，即误差和训练集的内积：

gradient=x.t*error/N

这背后的逻辑是什么？