Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/matlab/14.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
梯度下降与闭式解——MATLAB中的不同假设线_Matlab_Machine Learning_Linear Regression_Gradient Descent - Fatal编程技术网

梯度下降与闭式解——MATLAB中的不同假设线

梯度下降与闭式解——MATLAB中的不同假设线,matlab,machine-learning,linear-regression,gradient-descent,Matlab,Machine Learning,Linear Regression,Gradient Descent,我正在编写我从coursera机器学习课程(MATLAB)中学到的关于线性回归的知识。我发现了一个类似的帖子,但我似乎不能理解所有的东西。也许是因为我的机器学习基础有点薄弱 我面临的问题是,对于一些数据。。。梯度下降(GD)和闭式解(CFS)给出了相同的假设线。然而,在一个特定的数据集上,结果是不同的。我读过一些关于,如果数据是单数的,那么结果应该是相同的。然而,我不知道如何检查我的数据是否是单一的 我将尽我所能说明: 1) 首先,这里是改编自的MATLAB代码。对于给定的数据集,当GD和CFS

我正在编写我从coursera机器学习课程(MATLAB)中学到的关于线性回归的知识。我发现了一个类似的帖子,但我似乎不能理解所有的东西。也许是因为我的机器学习基础有点薄弱

我面临的问题是,对于一些数据。。。梯度下降(GD)和闭式解(CFS)给出了相同的假设线。然而,在一个特定的数据集上,结果是不同的。我读过一些关于,如果数据是单数的,那么结果应该是相同的。然而,我不知道如何检查我的数据是否是单一的

我将尽我所能说明:

1) 首先,这里是改编自的MATLAB代码。对于给定的数据集,当GD和CFS给出相似的结果时,一切都很好

数据集

X                   Y
2.06587460000000    0.779189260000000
2.36840870000000    0.915967570000000
2.53999290000000    0.905383540000000
2.54208040000000    0.905661380000000
2.54907900000000    0.938988900000000
2.78668820000000    0.966847400000000
2.91168250000000    0.964368240000000
3.03562700000000    0.914459390000000
3.11466960000000    0.939339440000000
3.15823890000000    0.960749710000000
3.32759440000000    0.898370940000000
3.37931650000000    0.912097390000000
3.41220060000000    0.942384990000000
3.42158230000000    0.966245780000000
3.53157320000000    1.05265000000000
3.63930020000000    1.01437910000000
3.67325370000000    0.959694260000000
3.92564620000000    0.968537160000000
4.04986460000000    1.07660650000000
4.24833480000000    1.14549780000000
4.34400520000000    1.03406250000000
4.38265310000000    1.00700090000000
4.42306020000000    0.966836480000000
4.61024430000000    1.08959190000000
4.68811830000000    1.06344620000000
4.97773330000000    1.12372390000000
5.03599670000000    1.03233740000000
5.06845360000000    1.08744520000000
5.41614910000000    1.07029880000000
5.43956230000000    1.16064930000000
5.45632070000000    1.07780370000000
5.56984580000000    1.10697580000000
5.60157290000000    1.09718750000000
5.68776170000000    1.16486030000000
5.72156020000000    1.14117960000000
5.85389140000000    1.08441560000000
6.19780260000000    1.12524930000000
6.35109410000000    1.11683410000000
6.47970330000000    1.19707890000000
6.73837910000000    1.20694620000000
6.86376860000000    1.12510460000000
7.02233870000000    1.12356720000000
7.07823730000000    1.21328290000000
7.15142320000000    1.25226520000000
7.46640230000000    1.24970650000000
7.59738740000000    1.17997060000000
7.74407170000000    1.18972990000000
7.77296620000000    1.30299340000000
7.82645140000000    1.26011340000000
7.93063560000000    1.25622670000000
    dataset = load('kangaroo.csv');
    % scale?
    x = dataset(:,1)/max(dataset(:,1));
    y = dataset(:,2)/max(dataset(:,2));
我的MATLAB代码:

clear all; close all; clc; 
x = load('ex2x.dat');
y = load('ex2y.dat');

m = length(y); % number of training examples

% Plot the training data
figure; % open a new figure window
plot(x, y, '*r');
ylabel('Height in meters')
xlabel('Age in years')

% Gradient descent
x = [ones(m, 1) x]; % Add a column of ones to x
theta = zeros(size(x(1,:)))'; % initialize fitting parameters
MAX_ITR = 1500;
alpha = 0.07;

for num_iterations = 1:MAX_ITR

    thetax = x * theta;
    % for theta_0 and x_0
    grad0 = (1/m) .* sum( x(:,1)' * (thetax - y));
    % for theta_0 and x_0
    grad1 = (1/m) .* sum( x(:,2)' * (thetax - y));

    % Here is the actual update
    theta(1) = theta(1) - alpha .* grad0;
    theta(2) = theta(2) - alpha .* grad1;
end
% print theta to screen
theta

% Plot the hypothesis (a.k.a. linear fit)
hold on
plot(x(:,2), x*theta, 'ob')
% Plot using the Closed Form Solution
plot(x(:,2), x*((x' * x)\x' * y), '--r')
legend('Training data', 'Linear regression', 'Closed Form')
hold off % don't overlay any more plots on this figure''
X    Y
609 241
629 222
620 233
564 207
645 247
493 189
606 226
660 240
630 215
672 231
778 263
616 220
727 271
810 284
778 279
823 272
755 268
710 278
701 238
803 255
855 308
838 281
830 288
864 306
635 236
565 204
562 216
580 225
596 220
597 219
636 201
559 213
615 228
740 234
677 237
675 217
629 211
692 238
710 221
730 281
763 292
686 251
717 231
737 275
816 275
[编辑:很抱歉错误的标注…这不是标准方程,而是封闭形式的解。我的错误] 该代码的结果如下所示(这是peachy:D GD和CFS的相同结果)——

  • 现在,我正在用另一个数据集测试我的代码。数据集的URL为-灰色袋鼠。我将其转换为CSV并将其读入MATLAB。注意,我做了缩放(除以最大值,因为如果我不这么做,就不会出现任何假设线,θ在MATLAB中也不是一个数字(NaN) 灰袋鼠数据集:

    clear all; close all; clc; 
    x = load('ex2x.dat');
    y = load('ex2y.dat');
    
    m = length(y); % number of training examples
    
    % Plot the training data
    figure; % open a new figure window
    plot(x, y, '*r');
    ylabel('Height in meters')
    xlabel('Age in years')
    
    % Gradient descent
    x = [ones(m, 1) x]; % Add a column of ones to x
    theta = zeros(size(x(1,:)))'; % initialize fitting parameters
    MAX_ITR = 1500;
    alpha = 0.07;
    
    for num_iterations = 1:MAX_ITR
    
        thetax = x * theta;
        % for theta_0 and x_0
        grad0 = (1/m) .* sum( x(:,1)' * (thetax - y));
        % for theta_0 and x_0
        grad1 = (1/m) .* sum( x(:,2)' * (thetax - y));
    
        % Here is the actual update
        theta(1) = theta(1) - alpha .* grad0;
        theta(2) = theta(2) - alpha .* grad1;
    end
    % print theta to screen
    theta
    
    % Plot the hypothesis (a.k.a. linear fit)
    hold on
    plot(x(:,2), x*theta, 'ob')
    % Plot using the Closed Form Solution
    plot(x(:,2), x*((x' * x)\x' * y), '--r')
    legend('Training data', 'Linear regression', 'Closed Form')
    hold off % don't overlay any more plots on this figure''
    
    X    Y
    609 241
    629 222
    620 233
    564 207
    645 247
    493 189
    606 226
    660 240
    630 215
    672 231
    778 263
    616 220
    727 271
    810 284
    778 279
    823 272
    755 268
    710 278
    701 238
    803 255
    855 308
    838 281
    830 288
    864 306
    635 236
    565 204
    562 216
    580 225
    596 220
    597 219
    636 201
    559 213
    615 228
    740 234
    677 237
    675 217
    629 211
    692 238
    710 221
    730 281
    763 292
    686 251
    717 231
    737 275
    816 275
    
    我对此数据集中读取的代码所做的更改

    X                   Y
    2.06587460000000    0.779189260000000
    2.36840870000000    0.915967570000000
    2.53999290000000    0.905383540000000
    2.54208040000000    0.905661380000000
    2.54907900000000    0.938988900000000
    2.78668820000000    0.966847400000000
    2.91168250000000    0.964368240000000
    3.03562700000000    0.914459390000000
    3.11466960000000    0.939339440000000
    3.15823890000000    0.960749710000000
    3.32759440000000    0.898370940000000
    3.37931650000000    0.912097390000000
    3.41220060000000    0.942384990000000
    3.42158230000000    0.966245780000000
    3.53157320000000    1.05265000000000
    3.63930020000000    1.01437910000000
    3.67325370000000    0.959694260000000
    3.92564620000000    0.968537160000000
    4.04986460000000    1.07660650000000
    4.24833480000000    1.14549780000000
    4.34400520000000    1.03406250000000
    4.38265310000000    1.00700090000000
    4.42306020000000    0.966836480000000
    4.61024430000000    1.08959190000000
    4.68811830000000    1.06344620000000
    4.97773330000000    1.12372390000000
    5.03599670000000    1.03233740000000
    5.06845360000000    1.08744520000000
    5.41614910000000    1.07029880000000
    5.43956230000000    1.16064930000000
    5.45632070000000    1.07780370000000
    5.56984580000000    1.10697580000000
    5.60157290000000    1.09718750000000
    5.68776170000000    1.16486030000000
    5.72156020000000    1.14117960000000
    5.85389140000000    1.08441560000000
    6.19780260000000    1.12524930000000
    6.35109410000000    1.11683410000000
    6.47970330000000    1.19707890000000
    6.73837910000000    1.20694620000000
    6.86376860000000    1.12510460000000
    7.02233870000000    1.12356720000000
    7.07823730000000    1.21328290000000
    7.15142320000000    1.25226520000000
    7.46640230000000    1.24970650000000
    7.59738740000000    1.17997060000000
    7.74407170000000    1.18972990000000
    7.77296620000000    1.30299340000000
    7.82645140000000    1.26011340000000
    7.93063560000000    1.25622670000000
    
        dataset = load('kangaroo.csv');
        % scale?
        x = dataset(:,1)/max(dataset(:,1));
        y = dataset(:,2)/max(dataset(:,2));
    
    结果是这样的:[编辑:很抱歉错误的标注……这不是正规方程,而是封闭形式的解。我的错误]


    我想知道这种差异是否有任何解释?任何帮助都将不胜感激。提前谢谢你

    我还没有运行您的代码,但让我告诉您一些理论:

    如果你的代码是正确的(看起来是这样的):增加
    MAX\u ITER
    ,它看起来会更好

    梯度下降不能保证收敛于
    MAX_ITER
    ,实际上梯度下降是一种非常缓慢的方法(收敛方面)

    “标准”凸函数(如您试图求解的函数)的梯度下降收敛如下(从互联网上):

    忘记迭代次数,因为它依赖于问题,关注形状。可能发生的情况是,在这张图中,你的maxiter掉到了类似“20”的地方。因此,你的结果是好的,但不是最好的

    然而,直接求解正规方程组将得到最小平方误差解。(我假设正态方程的意思是
    x=(A'*A)^(-1)*A'*b
    )。问题是,在很多情况下,您无法在内存中存储
    A
    ,或者在不适定问题中,正态方程将导致数值不稳定的病态矩阵,因此使用梯度下降

    我想我已经明白了。
    我不成熟地认为最多1500次迭代就足够了。我尝试使用更高的值(即5k和10k),两种算法开始给出类似的解决方案。所以我的主要问题是迭代次数。需要更多的迭代才能正确地收敛到该数据集:D

    谢谢您的解释。显然我在问题本身上犯了一个错误。这不是正规方程,而是封闭形式的解。抱歉给你添麻烦了。但是无论如何谢谢你的解释。。。在这之后,我将沿着法向方程的方向移动。