Matlab 下面的梯度下降算法迭代实现中的错误是什么？_Matlab_Machine Learning_Regression_Octave_Gradient Descent

Matlab 下面的梯度下降算法迭代实现中的错误是什么？

matlab machine-learning octave

Matlab 下面的梯度下降算法迭代实现中的错误是什么？,matlab,machine-learning,regression,octave,gradient-descent,Matlab,Machine Learning,Regression,Octave,Gradient Descent,我曾尝试实现梯度下降算法的迭代版本，但该算法无法正常工作。但是，同一算法的矢量化实现工作正常。以下是迭代实现： function [theta] = gradientDescent_i(X, y, theta, alpha, iterations) % get the number of rows and columns nrows = size(X, 1); ncols = size(X, 2); % initialize the hypothesis v

我曾尝试实现梯度下降算法的迭代版本，但该算法无法正常工作。但是，同一算法的矢量化实现工作正常。
以下是迭代实现：

function [theta] = gradientDescent_i(X, y, theta, alpha, iterations)

    % get the number of rows and columns
    nrows = size(X, 1);
    ncols = size(X, 2);

    % initialize the hypothesis vector
    h = zeros(nrows, 1);

    % initialize the temporary theta vector
    theta_temp = zeros(ncols, 1);

    % run gradient descent for the specified number of iterations
    count = 1;

    while count <= iterations

        % calculate the hypothesis values and fill into the vector
        for i = 1 : nrows
            for j = 1 : ncols
                term = theta(j) * X(i, j);
                h(i) = h(i) + term;
            end
        end

        % calculate the gradient
        for j = 1 : ncols
            for i = 1 : nrows
                term = (h(i) - y(i)) * X(i, j);
                theta_temp(j) = theta_temp(j) + term;
            end
        end

        % update the gradient with the factor
        fact = alpha / nrows;

        for i = 1 : ncols
            theta_temp(i) = fact * theta_temp(i);
        end

        % update the theta
        for i = 1 : ncols
            theta(i) = theta(i) - theta_temp(i);
        end

        % update the count
        count += 1;
    end
end

第一个代码的问题在于，

theta_temp

和

向量未正确初始化。对于第一次迭代（当

count

值等于1时），代码运行正常，因为对于该特定迭代，

和

theta_temp

向量已正确初始化为0。但是，由于这些是梯度下降每次迭代的临时向量，因此在后续迭代中，它们没有再次初始化为0向量。也就是说，对于迭代2，修改为

h（i）

和

theta_temp（i）

的值只是添加到旧值中。因此，代码不能正常工作。您需要在每次迭代开始时将向量更新为零向量，然后它们才能正常工作。下面是我对您的代码的实现（第一个，观察更改）：

function[theta]=梯度下降（X，y，theta，alpha，迭代）
%获取行数和列数
nrows=尺寸（X，1）；
ncols=尺寸（X，2）；
%按指定的迭代次数运行渐变下降
计数=1；
当count时，您是否尝试单步执行代码并比较两个算法之间每个迭代步骤的结果？第二次迭代是错误的吗？如果是这样，可以正确初始化theta\u temp
。请注意，这两个版本都是迭代的，如果不进行迭代，就无法进行梯度下降。此外，您是否确定已编码count+=1在“非矢量化”实现中？据我所知，这不是有效的Matlab语法。@FlorisSA+=
在八度音阶中有效，也有标记，因此我假设OP实际使用的是八度音阶。@CrisLuengo是的，首先我检查了两个代码的假设向量的输出（在每一步）结果证明这是正确的，所以我猜代码可能在梯度计算步骤出错，因为这就是每次迭代的向量值开始不同的地方（第一个代码是错误的）。可能我初始化的theta\u temp
错误，所以我会检查并确认。通过迭代，第一个代码对所有计算使用矩阵乘法，这比第二个代码快得多（当然，梯度更新必须是迭代的）。可以称之为“循环代码”，以避免混淆。在任何情况下，我都没有足够详细地查看您的代码以了解它的功能，但我突然想到的一件事是，theta_temp在第一次迭代之前被初始化为零，而不是在两次迭代之间。也许这是故意的，但看起来可能是个bug。
function [theta, theta_all, J_cost] = gradientDescent(X, y, theta, alpha)

    % set the learning rate
    learn_rate = alpha;

    % set the number of iterations
    n = 1500;

    % number of training examples
    m = length(y);

    % initialize the theta_new vector
    l = length(theta);
    theta_new = zeros(l,1);

    % initialize the cost vector
    J_cost = zeros(n,1);

    % initialize the vector to store all the calculated theta values
    theta_all = zeros(n,2);

    % perform gradient descent for the specified number of iterations
    for i = 1 : n

        % calculate the hypothesis
        hypothesis = X * theta;

        % calculate the error
        err = hypothesis - y;

        % calculate the gradient
        grad = X' * err;

        % calculate the new theta
        theta_new = (learn_rate/m) .* grad;

        % update the old theta
        theta = theta - theta_new;

        % update the cost
        J_cost(i) = computeCost(X, y, theta);

        % store the calculated theta value
        if i < n
            index = i + 1;
            theta_all(index,:) = theta';
    end
end

function[X, y] = fileReader(filename)

    % load the dataset
    dataset = load(filename);

    % get the dimensions of the dataset
    nrows = size(dataset, 1);
    ncols = size(dataset, 2);

    % generate the X matrix from the dataset
    X = dataset(:, 1 : ncols - 1);

    % generate the y vector
    y = dataset(:, ncols);

    % append 1's to the X matrix
    X = [ones(nrows, 1), X];
end

function [theta] = gradientDescent_i(X, y, theta, alpha, iterations)

    % get the number of rows and columns
    nrows = size(X, 1);
    ncols = size(X, 2);

    % run gradient descent for the specified number of iterations
    count = 1;

    while count <= iterations

        % initialize the hypothesis vector
        h = zeros(nrows, 1);

        % initialize the temporary theta vector
        theta_temp = zeros(ncols, 1);


        % calculate the hypothesis values and fill into the vector
        for i = 1 : nrows
            for j = 1 : ncols
                term = theta(j) * X(i, j);
                h(i) = h(i) + term;
            end
        end

        % calculate the gradient
        for j = 1 : ncols
            for i = 1 : nrows
                term = (h(i) - y(i)) * X(i, j);
                theta_temp(j) = theta_temp(j) + term;
            end
        end

        % update the gradient with the factor
        fact = alpha / nrows;

        for i = 1 : ncols
            theta_temp(i) = fact * theta_temp(i);
        end

        % update the theta
        for i = 1 : ncols
            theta(i) = theta(i) - theta_temp(i);
        end

        % update the count
        count += 1;
    end
end