使用python和numpy的梯度下降

使用python和numpy的梯度下降,python,numpy,machine-learning,linear-regression,gradient-descent,Python,Numpy,Machine Learning,Linear Regression,Gradient Descent,上面代码中的θ是100.2100.2,但在matlab中应该是100.261.09,这是正确的。我认为您的代码有点太复杂了,需要更多的结构,否则您将迷失在所有方程和运算中。最终,此回归归结为四个操作: 计算假设h=X*θ 计算损失=h-y,可能是成本的平方(损失^2)/2m 计算梯度=X'*损失/m 更新参数θ=θ-α*梯度 在你的情况下,我猜你把m和n混淆了。此处m表示训练集中的示例数,而不是功能数 让我们看看我对您的代码的修改: def gradient(X_norm,y,theta,alp

上面代码中的θ是
100.2100.2
,但在matlab中应该是
100.261.09
,这是正确的。

我认为您的代码有点太复杂了,需要更多的结构,否则您将迷失在所有方程和运算中。最终,此回归归结为四个操作:

  • 计算假设h=X*θ
  • 计算损失=h-y,可能是成本的平方(损失^2)/2m
  • 计算梯度=X'*损失/m
  • 更新参数θ=θ-α*梯度
  • 在你的情况下,我猜你把
    m
    n
    混淆了。此处
    m
    表示训练集中的示例数,而不是功能数

    让我们看看我对您的代码的修改:

    def gradient(X_norm,y,theta,alpha,m,n,num_it):
        temp=np.array(np.zeros_like(theta,float))
        for i in range(0,num_it):
            h=np.dot(X_norm,theta)
            #temp[j]=theta[j]-(alpha/m)*(  np.sum( (h-y)*X_norm[:,j][np.newaxis,:] )  )
            temp[0]=theta[0]-(alpha/m)*(np.sum(h-y))
            temp[1]=theta[1]-(alpha/m)*(np.sum((h-y)*X_norm[:,1]))
            theta=temp
        return theta
    
    
    
    X_norm,mean,std=featureScale(X)
    #length of X (number of rows)
    m=len(X)
    X_norm=np.array([np.ones(m),X_norm])
    n,m=np.shape(X_norm)
    num_it=1500
    alpha=0.01
    theta=np.zeros(n,float)[:,np.newaxis]
    X_norm=X_norm.transpose()
    theta=gradient(X_norm,y,theta,alpha,m,n,num_it)
    print theta
    
    首先,我创建了一个小的随机数据集,如下所示:

    import numpy as np
    import random
    
    # m denotes the number of examples here, not the number of features
    def gradientDescent(x, y, theta, alpha, m, numIterations):
        xTrans = x.transpose()
        for i in range(0, numIterations):
            hypothesis = np.dot(x, theta)
            loss = hypothesis - y
            # avg cost per example (the 2 in 2*m doesn't really matter here.
            # But to be consistent with the gradient, I include it)
            cost = np.sum(loss ** 2) / (2 * m)
            print("Iteration %d | Cost: %f" % (i, cost))
            # avg gradient per example
            gradient = np.dot(xTrans, loss) / m
            # update
            theta = theta - alpha * gradient
        return theta
    
    
    def genData(numPoints, bias, variance):
        x = np.zeros(shape=(numPoints, 2))
        y = np.zeros(shape=numPoints)
        # basically a straight line
        for i in range(0, numPoints):
            # bias feature
            x[i][0] = 1
            x[i][1] = i
            # our target variable
            y[i] = (i + bias) + random.uniform(0, 1) * variance
        return x, y
    
    # gen 100 points with a bias of 25 and 10 variance as a bit of noise
    x, y = genData(100, 25, 10)
    m, n = np.shape(x)
    numIterations= 100000
    alpha = 0.0005
    theta = np.ones(n)
    theta = gradientDescent(x, y, theta, alpha, m, numIterations)
    print(theta)
    

    如您所见,我还添加了由excel计算的生成的回归线和公式

    您需要注意使用梯度下降回归的直觉。在对数据X进行完整的批量传递时,需要将每个示例的m损失减少为一次重量更新。在这种情况下,这是梯度总和的平均值,因此除以
    m

    接下来需要注意的是跟踪收敛并调整学习速度。因此,您应该始终跟踪每次迭代的成本,甚至绘制它

    如果运行我的示例,返回的θ将如下所示:

    import numpy as np
    import random
    
    # m denotes the number of examples here, not the number of features
    def gradientDescent(x, y, theta, alpha, m, numIterations):
        xTrans = x.transpose()
        for i in range(0, numIterations):
            hypothesis = np.dot(x, theta)
            loss = hypothesis - y
            # avg cost per example (the 2 in 2*m doesn't really matter here.
            # But to be consistent with the gradient, I include it)
            cost = np.sum(loss ** 2) / (2 * m)
            print("Iteration %d | Cost: %f" % (i, cost))
            # avg gradient per example
            gradient = np.dot(xTrans, loss) / m
            # update
            theta = theta - alpha * gradient
        return theta
    
    
    def genData(numPoints, bias, variance):
        x = np.zeros(shape=(numPoints, 2))
        y = np.zeros(shape=numPoints)
        # basically a straight line
        for i in range(0, numPoints):
            # bias feature
            x[i][0] = 1
            x[i][1] = i
            # our target variable
            y[i] = (i + bias) + random.uniform(0, 1) * variance
        return x, y
    
    # gen 100 points with a bias of 25 and 10 variance as a bit of noise
    x, y = genData(100, 25, 10)
    m, n = np.shape(x)
    numIterations= 100000
    alpha = 0.0005
    theta = np.ones(n)
    theta = gradientDescent(x, y, theta, alpha, m, numIterations)
    print(theta)
    

    这实际上非常接近excel计算的方程(y=x+30)。请注意,当我们将偏差传递到第一列时,第一个θ值表示偏差权重。

    下面您可以找到我对线性回归问题的梯度下降的实现

    首先,计算梯度,比如
    X.T*(X*w-y)/N
    ,同时用这个梯度更新当前θ

    • X:特征矩阵
    • y:目标值
    • w:权重/值
    • N:训练集的大小
    以下是python代码:

    Iteration 99997 | Cost: 47883.706462
    Iteration 99998 | Cost: 47883.706462
    Iteration 99999 | Cost: 47883.706462
    [ 29.25567368   1.01108458]
    

    我知道这个问题已经得到了回答,但我已经对GD功能进行了一些更新:

    import pandas as pd
    import numpy as np
    from matplotlib import pyplot as plt
    import random
    
    def generateSample(N, variance=100):
        X = np.matrix(range(N)).T + 1
        Y = np.matrix([random.random() * variance + i * 10 + 900 for i in range(len(X))]).T
        return X, Y
    
    def fitModel_gradient(x, y):
        N = len(x)
        w = np.zeros((x.shape[1], 1))
        eta = 0.0001
    
        maxIteration = 100000
        for i in range(maxIteration):
            error = x * w - y
            gradient = x.T * error / N
            w = w - eta * gradient
        return w
    
    def plotModel(x, y, w):
        plt.plot(x[:,1], y, "x")
        plt.plot(x[:,1], x * w, "r-")
        plt.show()
    
    def test(N, variance, modelFunction):
        X, Y = generateSample(N, variance)
        X = np.hstack([np.matrix(np.ones(len(X))).T, X])
        w = modelFunction(X, Y)
        plotModel(X, Y, w)
    
    
    test(50, 600, fitModel_gradient)
    test(50, 1000, fitModel_gradient)
    test(100, 200, fitModel_gradient)
    

    这个函数在迭代过程中减少了alpha值,使函数收敛得更快参见R中的一个示例。我在Python中应用了相同的逻辑。

    继Python中的@thomas jungblut实现之后,我在Octave中也采用了相同的逻辑。如果您发现问题,请让我知道,我将修复+更新

    数据来自具有以下行的txt文件:

      ### COST FUNCTION
    
    def cost(theta,X,y):
         ### Evaluate half MSE (Mean square error)
         m = len(y)
         error = np.dot(X,theta) - y
         J = np.sum(error ** 2)/(2*m)
         return J
    
     cost(theta,X,y)
    
    
    
    def GD(X,y,theta,alpha):
    
        cost_histo = [0]
        theta_histo = [0]
    
        # an arbitrary gradient, to pass the initial while() check
        delta = [np.repeat(1,len(X))]
        # Initial theta
        old_cost = cost(theta,X,y)
    
        while (np.max(np.abs(delta)) > 1e-6):
            error = np.dot(X,theta) - y
            delta = np.dot(np.transpose(X),error)/len(y)
            trial_theta = theta - alpha * delta
            trial_cost = cost(trial_theta,X,y)
            while (trial_cost >= old_cost):
                trial_theta = (theta +trial_theta)/2
                trial_cost = cost(trial_theta,X,y)
                cost_histo = cost_histo + trial_cost
                theta_histo = theta_histo +  trial_theta
            old_cost = trial_cost
            theta = trial_theta
        Intercept = theta[0] 
        Slope = theta[1]  
        return [Intercept,Slope]
    
    res = GD(X,y,theta,alpha)
    
    把它看作是我们想要预测的特征[卧室数量][mts2]和最后一列[租金价格]的一个非常粗略的样本

    以下是倍频程实现:

    1 10 1000
    2 20 2500
    3 25 3500
    4 40 5500
    5 60 6200
    

    分号在python和缩进(如果是基本的)中被忽略。在gradientDescent中,
    /2*m
    应该是
    /(2*m)
    ?使用
    损失
    作为绝对差异不是一个好主意,因为“损失”通常是“成本”的同义词。你也不需要传递
    m
    ,NumPy数组知道它们自己的形状。有人能解释一下代价函数的偏导数如何等于函数:np.dot(xTrans,loss)/m吗?@Saurabh Verma:在我解释细节之前,首先,这句话:np.dot(xTrans,loss)/m是矩阵计算,同时计算一行中所有训练数据对、标签的梯度。结果是一个大小为(m×1)的向量。回到基本原理,如果我们对θ[j]取一个平方误差的偏导数,我们将取这个函数的导数:(np.dot(x[i],θ)-y[i])**2w.r.t.θ[j]。注意,θ是一个向量。结果应该是2*(np.dot(x[i],θ)-y[i])*x[j]。您可以手动确认这一点。与不必要地复制数据的xtrans=x.transpose()不同,每次使用xtrans时,您都可以使用x.T。为了有效地访问内存,只需要对x进行Fortran排序pd@Muatik我不明白如何得到梯度,即误差和训练集的内积:
    gradient=x.t*error/N
    这背后的逻辑是什么?