Matlab 下面的梯度下降算法迭代实现中的错误是什么?
我曾尝试实现梯度下降算法的迭代版本,但该算法无法正常工作。但是,同一算法的矢量化实现工作正常。Matlab 下面的梯度下降算法迭代实现中的错误是什么?,matlab,machine-learning,regression,octave,gradient-descent,Matlab,Machine Learning,Regression,Octave,Gradient Descent,我曾尝试实现梯度下降算法的迭代版本,但该算法无法正常工作。但是,同一算法的矢量化实现工作正常。 以下是迭代实现: function [theta] = gradientDescent_i(X, y, theta, alpha, iterations) % get the number of rows and columns nrows = size(X, 1); ncols = size(X, 2); % initialize the hypothesis v
以下是迭代实现:
function [theta] = gradientDescent_i(X, y, theta, alpha, iterations)
% get the number of rows and columns
nrows = size(X, 1);
ncols = size(X, 2);
% initialize the hypothesis vector
h = zeros(nrows, 1);
% initialize the temporary theta vector
theta_temp = zeros(ncols, 1);
% run gradient descent for the specified number of iterations
count = 1;
while count <= iterations
% calculate the hypothesis values and fill into the vector
for i = 1 : nrows
for j = 1 : ncols
term = theta(j) * X(i, j);
h(i) = h(i) + term;
end
end
% calculate the gradient
for j = 1 : ncols
for i = 1 : nrows
term = (h(i) - y(i)) * X(i, j);
theta_temp(j) = theta_temp(j) + term;
end
end
% update the gradient with the factor
fact = alpha / nrows;
for i = 1 : ncols
theta_temp(i) = fact * theta_temp(i);
end
% update the theta
for i = 1 : ncols
theta(i) = theta(i) - theta_temp(i);
end
% update the count
count += 1;
end
end
第一个代码的问题在于,
theta_temp
和h
向量未正确初始化。对于第一次迭代(当count
值等于1时),代码运行正常,因为对于该特定迭代,h
和theta_temp
向量已正确初始化为0。但是,由于这些是梯度下降每次迭代的临时向量,因此在后续迭代中,它们没有再次初始化为0向量。也就是说,对于迭代2,修改为h(i)
和theta_temp(i)
的值只是添加到旧值中。因此,代码不能正常工作。您需要在每次迭代开始时将向量更新为零向量,然后它们才能正常工作。下面是我对您的代码的实现(第一个,观察更改):
function[theta]=梯度下降(X,y,theta,alpha,迭代)
%获取行数和列数
nrows=尺寸(X,1);
ncols=尺寸(X,2);
%按指定的迭代次数运行渐变下降
计数=1;
当count时,您是否尝试单步执行代码并比较两个算法之间每个迭代步骤的结果?第二次迭代是错误的吗?如果是这样,可以正确初始化theta\u temp
。请注意,这两个版本都是迭代的,如果不进行迭代,就无法进行梯度下降。此外,您是否确定已编码count+=1代码>在“非矢量化”实现中?据我所知,这不是有效的Matlab语法。@FlorisSA+=
在八度音阶中有效,也有标记,因此我假设OP实际使用的是八度音阶。@CrisLuengo是的,首先我检查了两个代码的假设向量的输出(在每一步)结果证明这是正确的,所以我猜代码可能在梯度计算步骤出错,因为这就是每次迭代的向量值开始不同的地方(第一个代码是错误的)。可能我初始化的theta\u temp
错误,所以我会检查并确认。通过迭代,第一个代码对所有计算使用矩阵乘法,这比第二个代码快得多(当然,梯度更新必须是迭代的)。可以称之为“循环代码”,以避免混淆。在任何情况下,我都没有足够详细地查看您的代码以了解它的功能,但我突然想到的一件事是,theta_temp
在第一次迭代之前被初始化为零,而不是在两次迭代之间。也许这是故意的,但看起来可能是个bug。
function [theta, theta_all, J_cost] = gradientDescent(X, y, theta, alpha)
% set the learning rate
learn_rate = alpha;
% set the number of iterations
n = 1500;
% number of training examples
m = length(y);
% initialize the theta_new vector
l = length(theta);
theta_new = zeros(l,1);
% initialize the cost vector
J_cost = zeros(n,1);
% initialize the vector to store all the calculated theta values
theta_all = zeros(n,2);
% perform gradient descent for the specified number of iterations
for i = 1 : n
% calculate the hypothesis
hypothesis = X * theta;
% calculate the error
err = hypothesis - y;
% calculate the gradient
grad = X' * err;
% calculate the new theta
theta_new = (learn_rate/m) .* grad;
% update the old theta
theta = theta - theta_new;
% update the cost
J_cost(i) = computeCost(X, y, theta);
% store the calculated theta value
if i < n
index = i + 1;
theta_all(index,:) = theta';
end
end
function[X, y] = fileReader(filename)
% load the dataset
dataset = load(filename);
% get the dimensions of the dataset
nrows = size(dataset, 1);
ncols = size(dataset, 2);
% generate the X matrix from the dataset
X = dataset(:, 1 : ncols - 1);
% generate the y vector
y = dataset(:, ncols);
% append 1's to the X matrix
X = [ones(nrows, 1), X];
end
function [theta] = gradientDescent_i(X, y, theta, alpha, iterations)
% get the number of rows and columns
nrows = size(X, 1);
ncols = size(X, 2);
% run gradient descent for the specified number of iterations
count = 1;
while count <= iterations
% initialize the hypothesis vector
h = zeros(nrows, 1);
% initialize the temporary theta vector
theta_temp = zeros(ncols, 1);
% calculate the hypothesis values and fill into the vector
for i = 1 : nrows
for j = 1 : ncols
term = theta(j) * X(i, j);
h(i) = h(i) + term;
end
end
% calculate the gradient
for j = 1 : ncols
for i = 1 : nrows
term = (h(i) - y(i)) * X(i, j);
theta_temp(j) = theta_temp(j) + term;
end
end
% update the gradient with the factor
fact = alpha / nrows;
for i = 1 : ncols
theta_temp(i) = fact * theta_temp(i);
end
% update the theta
for i = 1 : ncols
theta(i) = theta(i) - theta_temp(i);
end
% update the count
count += 1;
end
end