Machine learning 为什么神经网络没有';你好像不知道?(从头开始建造)

Machine learning 为什么神经网络没有';你好像不知道?(从头开始建造),machine-learning,neural-network,octave,Machine Learning,Neural Network,Octave,我现在正在上Andrew Ng的Coursera机器学习课程。我用从中获得的知识从头开始实现了一个神经网络。但无论我做什么,网络似乎都没有学到东西 数据集详细信息:我使用了MNIST手写图像数据库。(0-9个数字)。它有5000个训练数据。我已将前4999作为训练数据 该网络由三层组成。输入层(400个节点)、隐藏层(26个节点)、输出层(10个节点) 八度程序: Main.m load('ex4data1.mat'); X = X(1:end-1, :); y = y(1:end-1, :

我现在正在上Andrew Ng的Coursera机器学习课程。我用从中获得的知识从头开始实现了一个神经网络。但无论我做什么,网络似乎都没有学到东西

数据集详细信息:我使用了MNIST手写图像数据库。(0-9个数字)。它有5000个训练数据。我已将前4999作为训练数据

该网络由三层组成。输入层(400个节点)、隐藏层(26个节点)、输出层(10个节点)

八度程序:

Main.m

load('ex4data1.mat');


X = X(1:end-1, :);
y = y(1:end-1, :);

size(X) % Dimension = 4999x400
size(y) % Dimension = 4999x10

noOfFeatures = size(X,2) + 1;

%Theta1 = randInitializeWeights(25,noOfFeatures); % Dimension = 25x401
%Theta2 = randInitializeWeights(26,10); % Dimension = 26x10


Theta = [Theta1(:); Theta2(:)];

learning_rate = 0.01;

for iter = 1:200,
 [J, grad] =nnCostGrad(Theta, X, y);
  Theta = Theta - (learning_rate * grad);
  plot(iter,J);
  hold on;
endfor

predict(Theta, [0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   8.5606e-06  1.9404e-06  -0.00073744 -0.008134   -0.01861    -0.018741   -0.018757   -0.019096   -0.016404   -0.0037819  0.00033035  1.2766e-05  0   0   0   0   0   0   0   0.00011642  0.00012005  -0.014044   -0.028454   0.080383    0.26654 0.27385 0.27873 0.27429 0.22468 0.027756    -0.0070632  0.00023472  0   0   0   0   0   0   1.2834e-17  -0.00032629 -0.013865   0.081565    0.3828  0.85785 1.0011  0.96971 0.93093 1.0038  0.96416 0.44926 -0.0056041  -0.0037832  0   0   0   0   5.1062e-06  0.00043641  -0.0039551  -0.026854   0.10076 0.64203 1.0314  0.85097 0.54312 0.3426  0.26892 0.66837 1.0126  0.9038  0.10448 -0.016642   0   0   0   0   2.5988e-05  -0.0031061  0.0075246   0.17754 0.79289 0.96563 0.46317 0.069172    -0.003641   -0.041218   -0.05019    0.1561  0.90176 1.0475  0.15106 -0.021604   0   0   0   5.8701e-05  -0.00064093 -0.032331   0.2782  0.93672 1.0432  0.598   -0.0035941  -0.021675   -0.0048102  6.1657e-05  -0.012377   0.15548 0.91487 0.9204  0.10917 -0.017106   0   0   0.00015625  -0.00042772 -0.025147   0.13053 0.78166 1.0284  0.75714 0.28467 0.0048687   -0.0031869  0   0.00083649  -0.037075   0.45264 1.0318  0.53903 -0.0024374  -0.0048029  0   0   -0.00070364 -0.012726   0.16171 0.77987 1.0368  0.80449 0.16059 -0.013817   0.0021488   -0.00021262 0.00020425  -0.0068591  0.00043171  0.72068 0.84814 0.15138 -0.02284    0.00019897  0   0   -0.0094041  0.037452    0.69439 1.0284  1.0165  0.88049 0.39212 -0.017412   -0.0001201  5.5522e-05  -0.0022391  -0.027607   0.36865 0.93641 0.45901 -0.04247    0.0011736   1.8893e-05  0   0   -0.019351   0.13    0.97982 0.94186 0.77515 0.87363 0.21278 -0.017235   0   0.0010994   -0.026179   0.12287 0.83081 0.7265  0.052444    -0.0061897  0   0   0   0   -0.0093656  0.036835    0.69908 1.0029  0.6057  0.3273  -0.03221    -0.048305   -0.043407   -0.057515   0.095567    0.72651 0.69537 0.14711 -0.012005   -0.0003028  0   0   0   0   -0.00067657 -0.0065142  0.11734 0.42195 0.99321 0.88201 0.74576 0.72387 0.72334 0.72002 0.84532 0.83186 0.068883    -0.027777   0.00035914  7.1487e-05  0   0   0   0   0.00015319  0.00031735  -0.022917   -0.004144   0.38704 0.50458 0.77489 0.99004 1.0077  1.0085  0.73791 0.21546 -0.026962   0.0013251   0   0   0   0   0   0   0   0   0.00023637  -0.0022603  -0.025199   -0.037389   0.066212    0.29113 0.32306 0.30626 0.087607    -0.025058   0.00023744  0   0   0   0   0   0   0   0   0   0   0   6.2094e-18  0.00067262  -0.011315   -0.035464   -0.038821   -0.037108   -0.013352   0.00099096  4.8918e-05  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0])
function [J, grad] = nnCostGrad (Theta, X, Y)

  X = [ones(size(X,1),1) X]; % Dimension = 4999x401

  m = size(X,1);

  Theta1 = reshape(Theta(1:10025), 25,401); % Dimension = 25x401
  Theta2 = reshape(Theta(10026:end), 10, 26); % Dimension = 26x10


  % Forward Propagation

  a_1 = X;
  z_2 = a_1 * Theta1'; % Dimension = 4999x25
  a_2 = [ones(size(X,1),1) sigmoid(z_2)]; % Dimension = 4999x26

  z_3 = a_2 * Theta2'; % Dimension = 4999x10
  a_3 = sigmoid(z_3);

  modified_y = zeros(size(Y,1), 10);
  for i=1:size(Y,1),
    modified_y(i,Y(i)) = 1;
  endfor

  regularize_term = (1/(2*m)) *  ( sum(sum(Theta1(:,2:end).^2,2),1) + sum(sum(Theta2(:,2:end).^2,2),1) );
  J = (1/m) * sum(sum(- ( modified_y .* log(a_3) + (1-modified_y) .* log(1-a_3) ),2),1) + regularize_term;



  % Back Propagation

    delta1 = 0;
    delta2 = 0;

    d3 = a_3 - modified_y; % Dimension = 4999x10
    d2 = (d3 * Theta2)(:, 2:end) .* sigmoidGradient(z_2); % Dimension = (4999x10 * 26x10')(:, 2:end) .* 4999x25 => 4999x25 .* 4999x25 = 4999x25

    delta1 = delta1 + (d2' * a_1); % Dimension = delta1 + (4999x25' * 4999x401) = 25x401 
    delta2 = delta2 + (d3' * a_2); % Dimension = delta2 + (4999x10' * 4999x26) = 10x26
        size(delta2)

    Theta1_grad = (1/m) * delta1; % Dimension = 25x401
    Theta2_grad = (1/m) * delta2; % Dimension = 10x26
        size(Theta2_grad)

    regularized_Theta1 = ((1/m) * Theta1);
    regularized_Theta2 = ((1/m) * Theta2);
            size(regularized_Theta2)

    Theta1_grad(:,2:end) = Theta1_grad(:,2:end) + regularized_Theta1(:,2:end);
    Theta2_grad(:,2:end) = Theta2_grad(:,2:end) + regularized_Theta2(:,2:end);

grad = [Theta1_grad(:); Theta2_grad(:)] ;    

endfunction
function a_3 = predict (Theta, X)
  X = [ones(size(X,1),1) X];
  Theta1 = reshape(Theta(1:10025), 25,401); % Dimension = 25x401
  Theta2 = reshape(Theta(10026:end), 10, 26); % Dimension = 26x10

  a_1 = X;
  z_2 = a_1 * Theta1'; % Dimension = 4999x25
  a_2 = [ones(size(X,1),1) sigmoid(z_2)]; % Dimension = 4999x26

  z_3 = a_2 * Theta2'; % Dimension = 4999x10
  sigmoid(z_3)
  [number, index] = max(sigmoid(z_3), [],2);
  a_3 = index;
endfunction
function val = sigmoid (z)
  val = 1 ./ (1+exp(-z));
endfunction
function val = sigmoidGradient (z)
  val = sigmoid(z) .* (1-sigmoid(z));
endfunction
nnCostGrad.m

load('ex4data1.mat');


X = X(1:end-1, :);
y = y(1:end-1, :);

size(X) % Dimension = 4999x400
size(y) % Dimension = 4999x10

noOfFeatures = size(X,2) + 1;

%Theta1 = randInitializeWeights(25,noOfFeatures); % Dimension = 25x401
%Theta2 = randInitializeWeights(26,10); % Dimension = 26x10


Theta = [Theta1(:); Theta2(:)];

learning_rate = 0.01;

for iter = 1:200,
 [J, grad] =nnCostGrad(Theta, X, y);
  Theta = Theta - (learning_rate * grad);
  plot(iter,J);
  hold on;
endfor

predict(Theta, [0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   8.5606e-06  1.9404e-06  -0.00073744 -0.008134   -0.01861    -0.018741   -0.018757   -0.019096   -0.016404   -0.0037819  0.00033035  1.2766e-05  0   0   0   0   0   0   0   0.00011642  0.00012005  -0.014044   -0.028454   0.080383    0.26654 0.27385 0.27873 0.27429 0.22468 0.027756    -0.0070632  0.00023472  0   0   0   0   0   0   1.2834e-17  -0.00032629 -0.013865   0.081565    0.3828  0.85785 1.0011  0.96971 0.93093 1.0038  0.96416 0.44926 -0.0056041  -0.0037832  0   0   0   0   5.1062e-06  0.00043641  -0.0039551  -0.026854   0.10076 0.64203 1.0314  0.85097 0.54312 0.3426  0.26892 0.66837 1.0126  0.9038  0.10448 -0.016642   0   0   0   0   2.5988e-05  -0.0031061  0.0075246   0.17754 0.79289 0.96563 0.46317 0.069172    -0.003641   -0.041218   -0.05019    0.1561  0.90176 1.0475  0.15106 -0.021604   0   0   0   5.8701e-05  -0.00064093 -0.032331   0.2782  0.93672 1.0432  0.598   -0.0035941  -0.021675   -0.0048102  6.1657e-05  -0.012377   0.15548 0.91487 0.9204  0.10917 -0.017106   0   0   0.00015625  -0.00042772 -0.025147   0.13053 0.78166 1.0284  0.75714 0.28467 0.0048687   -0.0031869  0   0.00083649  -0.037075   0.45264 1.0318  0.53903 -0.0024374  -0.0048029  0   0   -0.00070364 -0.012726   0.16171 0.77987 1.0368  0.80449 0.16059 -0.013817   0.0021488   -0.00021262 0.00020425  -0.0068591  0.00043171  0.72068 0.84814 0.15138 -0.02284    0.00019897  0   0   -0.0094041  0.037452    0.69439 1.0284  1.0165  0.88049 0.39212 -0.017412   -0.0001201  5.5522e-05  -0.0022391  -0.027607   0.36865 0.93641 0.45901 -0.04247    0.0011736   1.8893e-05  0   0   -0.019351   0.13    0.97982 0.94186 0.77515 0.87363 0.21278 -0.017235   0   0.0010994   -0.026179   0.12287 0.83081 0.7265  0.052444    -0.0061897  0   0   0   0   -0.0093656  0.036835    0.69908 1.0029  0.6057  0.3273  -0.03221    -0.048305   -0.043407   -0.057515   0.095567    0.72651 0.69537 0.14711 -0.012005   -0.0003028  0   0   0   0   -0.00067657 -0.0065142  0.11734 0.42195 0.99321 0.88201 0.74576 0.72387 0.72334 0.72002 0.84532 0.83186 0.068883    -0.027777   0.00035914  7.1487e-05  0   0   0   0   0.00015319  0.00031735  -0.022917   -0.004144   0.38704 0.50458 0.77489 0.99004 1.0077  1.0085  0.73791 0.21546 -0.026962   0.0013251   0   0   0   0   0   0   0   0   0.00023637  -0.0022603  -0.025199   -0.037389   0.066212    0.29113 0.32306 0.30626 0.087607    -0.025058   0.00023744  0   0   0   0   0   0   0   0   0   0   0   6.2094e-18  0.00067262  -0.011315   -0.035464   -0.038821   -0.037108   -0.013352   0.00099096  4.8918e-05  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0])
function [J, grad] = nnCostGrad (Theta, X, Y)

  X = [ones(size(X,1),1) X]; % Dimension = 4999x401

  m = size(X,1);

  Theta1 = reshape(Theta(1:10025), 25,401); % Dimension = 25x401
  Theta2 = reshape(Theta(10026:end), 10, 26); % Dimension = 26x10


  % Forward Propagation

  a_1 = X;
  z_2 = a_1 * Theta1'; % Dimension = 4999x25
  a_2 = [ones(size(X,1),1) sigmoid(z_2)]; % Dimension = 4999x26

  z_3 = a_2 * Theta2'; % Dimension = 4999x10
  a_3 = sigmoid(z_3);

  modified_y = zeros(size(Y,1), 10);
  for i=1:size(Y,1),
    modified_y(i,Y(i)) = 1;
  endfor

  regularize_term = (1/(2*m)) *  ( sum(sum(Theta1(:,2:end).^2,2),1) + sum(sum(Theta2(:,2:end).^2,2),1) );
  J = (1/m) * sum(sum(- ( modified_y .* log(a_3) + (1-modified_y) .* log(1-a_3) ),2),1) + regularize_term;



  % Back Propagation

    delta1 = 0;
    delta2 = 0;

    d3 = a_3 - modified_y; % Dimension = 4999x10
    d2 = (d3 * Theta2)(:, 2:end) .* sigmoidGradient(z_2); % Dimension = (4999x10 * 26x10')(:, 2:end) .* 4999x25 => 4999x25 .* 4999x25 = 4999x25

    delta1 = delta1 + (d2' * a_1); % Dimension = delta1 + (4999x25' * 4999x401) = 25x401 
    delta2 = delta2 + (d3' * a_2); % Dimension = delta2 + (4999x10' * 4999x26) = 10x26
        size(delta2)

    Theta1_grad = (1/m) * delta1; % Dimension = 25x401
    Theta2_grad = (1/m) * delta2; % Dimension = 10x26
        size(Theta2_grad)

    regularized_Theta1 = ((1/m) * Theta1);
    regularized_Theta2 = ((1/m) * Theta2);
            size(regularized_Theta2)

    Theta1_grad(:,2:end) = Theta1_grad(:,2:end) + regularized_Theta1(:,2:end);
    Theta2_grad(:,2:end) = Theta2_grad(:,2:end) + regularized_Theta2(:,2:end);

grad = [Theta1_grad(:); Theta2_grad(:)] ;    

endfunction
function a_3 = predict (Theta, X)
  X = [ones(size(X,1),1) X];
  Theta1 = reshape(Theta(1:10025), 25,401); % Dimension = 25x401
  Theta2 = reshape(Theta(10026:end), 10, 26); % Dimension = 26x10

  a_1 = X;
  z_2 = a_1 * Theta1'; % Dimension = 4999x25
  a_2 = [ones(size(X,1),1) sigmoid(z_2)]; % Dimension = 4999x26

  z_3 = a_2 * Theta2'; % Dimension = 4999x10
  sigmoid(z_3)
  [number, index] = max(sigmoid(z_3), [],2);
  a_3 = index;
endfunction
function val = sigmoid (z)
  val = 1 ./ (1+exp(-z));
endfunction
function val = sigmoidGradient (z)
  val = sigmoid(z) .* (1-sigmoid(z));
endfunction
预测.m

load('ex4data1.mat');


X = X(1:end-1, :);
y = y(1:end-1, :);

size(X) % Dimension = 4999x400
size(y) % Dimension = 4999x10

noOfFeatures = size(X,2) + 1;

%Theta1 = randInitializeWeights(25,noOfFeatures); % Dimension = 25x401
%Theta2 = randInitializeWeights(26,10); % Dimension = 26x10


Theta = [Theta1(:); Theta2(:)];

learning_rate = 0.01;

for iter = 1:200,
 [J, grad] =nnCostGrad(Theta, X, y);
  Theta = Theta - (learning_rate * grad);
  plot(iter,J);
  hold on;
endfor

predict(Theta, [0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   8.5606e-06  1.9404e-06  -0.00073744 -0.008134   -0.01861    -0.018741   -0.018757   -0.019096   -0.016404   -0.0037819  0.00033035  1.2766e-05  0   0   0   0   0   0   0   0.00011642  0.00012005  -0.014044   -0.028454   0.080383    0.26654 0.27385 0.27873 0.27429 0.22468 0.027756    -0.0070632  0.00023472  0   0   0   0   0   0   1.2834e-17  -0.00032629 -0.013865   0.081565    0.3828  0.85785 1.0011  0.96971 0.93093 1.0038  0.96416 0.44926 -0.0056041  -0.0037832  0   0   0   0   5.1062e-06  0.00043641  -0.0039551  -0.026854   0.10076 0.64203 1.0314  0.85097 0.54312 0.3426  0.26892 0.66837 1.0126  0.9038  0.10448 -0.016642   0   0   0   0   2.5988e-05  -0.0031061  0.0075246   0.17754 0.79289 0.96563 0.46317 0.069172    -0.003641   -0.041218   -0.05019    0.1561  0.90176 1.0475  0.15106 -0.021604   0   0   0   5.8701e-05  -0.00064093 -0.032331   0.2782  0.93672 1.0432  0.598   -0.0035941  -0.021675   -0.0048102  6.1657e-05  -0.012377   0.15548 0.91487 0.9204  0.10917 -0.017106   0   0   0.00015625  -0.00042772 -0.025147   0.13053 0.78166 1.0284  0.75714 0.28467 0.0048687   -0.0031869  0   0.00083649  -0.037075   0.45264 1.0318  0.53903 -0.0024374  -0.0048029  0   0   -0.00070364 -0.012726   0.16171 0.77987 1.0368  0.80449 0.16059 -0.013817   0.0021488   -0.00021262 0.00020425  -0.0068591  0.00043171  0.72068 0.84814 0.15138 -0.02284    0.00019897  0   0   -0.0094041  0.037452    0.69439 1.0284  1.0165  0.88049 0.39212 -0.017412   -0.0001201  5.5522e-05  -0.0022391  -0.027607   0.36865 0.93641 0.45901 -0.04247    0.0011736   1.8893e-05  0   0   -0.019351   0.13    0.97982 0.94186 0.77515 0.87363 0.21278 -0.017235   0   0.0010994   -0.026179   0.12287 0.83081 0.7265  0.052444    -0.0061897  0   0   0   0   -0.0093656  0.036835    0.69908 1.0029  0.6057  0.3273  -0.03221    -0.048305   -0.043407   -0.057515   0.095567    0.72651 0.69537 0.14711 -0.012005   -0.0003028  0   0   0   0   -0.00067657 -0.0065142  0.11734 0.42195 0.99321 0.88201 0.74576 0.72387 0.72334 0.72002 0.84532 0.83186 0.068883    -0.027777   0.00035914  7.1487e-05  0   0   0   0   0.00015319  0.00031735  -0.022917   -0.004144   0.38704 0.50458 0.77489 0.99004 1.0077  1.0085  0.73791 0.21546 -0.026962   0.0013251   0   0   0   0   0   0   0   0   0.00023637  -0.0022603  -0.025199   -0.037389   0.066212    0.29113 0.32306 0.30626 0.087607    -0.025058   0.00023744  0   0   0   0   0   0   0   0   0   0   0   6.2094e-18  0.00067262  -0.011315   -0.035464   -0.038821   -0.037108   -0.013352   0.00099096  4.8918e-05  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0])
function [J, grad] = nnCostGrad (Theta, X, Y)

  X = [ones(size(X,1),1) X]; % Dimension = 4999x401

  m = size(X,1);

  Theta1 = reshape(Theta(1:10025), 25,401); % Dimension = 25x401
  Theta2 = reshape(Theta(10026:end), 10, 26); % Dimension = 26x10


  % Forward Propagation

  a_1 = X;
  z_2 = a_1 * Theta1'; % Dimension = 4999x25
  a_2 = [ones(size(X,1),1) sigmoid(z_2)]; % Dimension = 4999x26

  z_3 = a_2 * Theta2'; % Dimension = 4999x10
  a_3 = sigmoid(z_3);

  modified_y = zeros(size(Y,1), 10);
  for i=1:size(Y,1),
    modified_y(i,Y(i)) = 1;
  endfor

  regularize_term = (1/(2*m)) *  ( sum(sum(Theta1(:,2:end).^2,2),1) + sum(sum(Theta2(:,2:end).^2,2),1) );
  J = (1/m) * sum(sum(- ( modified_y .* log(a_3) + (1-modified_y) .* log(1-a_3) ),2),1) + regularize_term;



  % Back Propagation

    delta1 = 0;
    delta2 = 0;

    d3 = a_3 - modified_y; % Dimension = 4999x10
    d2 = (d3 * Theta2)(:, 2:end) .* sigmoidGradient(z_2); % Dimension = (4999x10 * 26x10')(:, 2:end) .* 4999x25 => 4999x25 .* 4999x25 = 4999x25

    delta1 = delta1 + (d2' * a_1); % Dimension = delta1 + (4999x25' * 4999x401) = 25x401 
    delta2 = delta2 + (d3' * a_2); % Dimension = delta2 + (4999x10' * 4999x26) = 10x26
        size(delta2)

    Theta1_grad = (1/m) * delta1; % Dimension = 25x401
    Theta2_grad = (1/m) * delta2; % Dimension = 10x26
        size(Theta2_grad)

    regularized_Theta1 = ((1/m) * Theta1);
    regularized_Theta2 = ((1/m) * Theta2);
            size(regularized_Theta2)

    Theta1_grad(:,2:end) = Theta1_grad(:,2:end) + regularized_Theta1(:,2:end);
    Theta2_grad(:,2:end) = Theta2_grad(:,2:end) + regularized_Theta2(:,2:end);

grad = [Theta1_grad(:); Theta2_grad(:)] ;    

endfunction
function a_3 = predict (Theta, X)
  X = [ones(size(X,1),1) X];
  Theta1 = reshape(Theta(1:10025), 25,401); % Dimension = 25x401
  Theta2 = reshape(Theta(10026:end), 10, 26); % Dimension = 26x10

  a_1 = X;
  z_2 = a_1 * Theta1'; % Dimension = 4999x25
  a_2 = [ones(size(X,1),1) sigmoid(z_2)]; % Dimension = 4999x26

  z_3 = a_2 * Theta2'; % Dimension = 4999x10
  sigmoid(z_3)
  [number, index] = max(sigmoid(z_3), [],2);
  a_3 = index;
endfunction
function val = sigmoid (z)
  val = 1 ./ (1+exp(-z));
endfunction
function val = sigmoidGradient (z)
  val = sigmoid(z) .* (1-sigmoid(z));
endfunction
sigmoid.m

load('ex4data1.mat');


X = X(1:end-1, :);
y = y(1:end-1, :);

size(X) % Dimension = 4999x400
size(y) % Dimension = 4999x10

noOfFeatures = size(X,2) + 1;

%Theta1 = randInitializeWeights(25,noOfFeatures); % Dimension = 25x401
%Theta2 = randInitializeWeights(26,10); % Dimension = 26x10


Theta = [Theta1(:); Theta2(:)];

learning_rate = 0.01;

for iter = 1:200,
 [J, grad] =nnCostGrad(Theta, X, y);
  Theta = Theta - (learning_rate * grad);
  plot(iter,J);
  hold on;
endfor

predict(Theta, [0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   8.5606e-06  1.9404e-06  -0.00073744 -0.008134   -0.01861    -0.018741   -0.018757   -0.019096   -0.016404   -0.0037819  0.00033035  1.2766e-05  0   0   0   0   0   0   0   0.00011642  0.00012005  -0.014044   -0.028454   0.080383    0.26654 0.27385 0.27873 0.27429 0.22468 0.027756    -0.0070632  0.00023472  0   0   0   0   0   0   1.2834e-17  -0.00032629 -0.013865   0.081565    0.3828  0.85785 1.0011  0.96971 0.93093 1.0038  0.96416 0.44926 -0.0056041  -0.0037832  0   0   0   0   5.1062e-06  0.00043641  -0.0039551  -0.026854   0.10076 0.64203 1.0314  0.85097 0.54312 0.3426  0.26892 0.66837 1.0126  0.9038  0.10448 -0.016642   0   0   0   0   2.5988e-05  -0.0031061  0.0075246   0.17754 0.79289 0.96563 0.46317 0.069172    -0.003641   -0.041218   -0.05019    0.1561  0.90176 1.0475  0.15106 -0.021604   0   0   0   5.8701e-05  -0.00064093 -0.032331   0.2782  0.93672 1.0432  0.598   -0.0035941  -0.021675   -0.0048102  6.1657e-05  -0.012377   0.15548 0.91487 0.9204  0.10917 -0.017106   0   0   0.00015625  -0.00042772 -0.025147   0.13053 0.78166 1.0284  0.75714 0.28467 0.0048687   -0.0031869  0   0.00083649  -0.037075   0.45264 1.0318  0.53903 -0.0024374  -0.0048029  0   0   -0.00070364 -0.012726   0.16171 0.77987 1.0368  0.80449 0.16059 -0.013817   0.0021488   -0.00021262 0.00020425  -0.0068591  0.00043171  0.72068 0.84814 0.15138 -0.02284    0.00019897  0   0   -0.0094041  0.037452    0.69439 1.0284  1.0165  0.88049 0.39212 -0.017412   -0.0001201  5.5522e-05  -0.0022391  -0.027607   0.36865 0.93641 0.45901 -0.04247    0.0011736   1.8893e-05  0   0   -0.019351   0.13    0.97982 0.94186 0.77515 0.87363 0.21278 -0.017235   0   0.0010994   -0.026179   0.12287 0.83081 0.7265  0.052444    -0.0061897  0   0   0   0   -0.0093656  0.036835    0.69908 1.0029  0.6057  0.3273  -0.03221    -0.048305   -0.043407   -0.057515   0.095567    0.72651 0.69537 0.14711 -0.012005   -0.0003028  0   0   0   0   -0.00067657 -0.0065142  0.11734 0.42195 0.99321 0.88201 0.74576 0.72387 0.72334 0.72002 0.84532 0.83186 0.068883    -0.027777   0.00035914  7.1487e-05  0   0   0   0   0.00015319  0.00031735  -0.022917   -0.004144   0.38704 0.50458 0.77489 0.99004 1.0077  1.0085  0.73791 0.21546 -0.026962   0.0013251   0   0   0   0   0   0   0   0   0.00023637  -0.0022603  -0.025199   -0.037389   0.066212    0.29113 0.32306 0.30626 0.087607    -0.025058   0.00023744  0   0   0   0   0   0   0   0   0   0   0   6.2094e-18  0.00067262  -0.011315   -0.035464   -0.038821   -0.037108   -0.013352   0.00099096  4.8918e-05  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0])
function [J, grad] = nnCostGrad (Theta, X, Y)

  X = [ones(size(X,1),1) X]; % Dimension = 4999x401

  m = size(X,1);

  Theta1 = reshape(Theta(1:10025), 25,401); % Dimension = 25x401
  Theta2 = reshape(Theta(10026:end), 10, 26); % Dimension = 26x10


  % Forward Propagation

  a_1 = X;
  z_2 = a_1 * Theta1'; % Dimension = 4999x25
  a_2 = [ones(size(X,1),1) sigmoid(z_2)]; % Dimension = 4999x26

  z_3 = a_2 * Theta2'; % Dimension = 4999x10
  a_3 = sigmoid(z_3);

  modified_y = zeros(size(Y,1), 10);
  for i=1:size(Y,1),
    modified_y(i,Y(i)) = 1;
  endfor

  regularize_term = (1/(2*m)) *  ( sum(sum(Theta1(:,2:end).^2,2),1) + sum(sum(Theta2(:,2:end).^2,2),1) );
  J = (1/m) * sum(sum(- ( modified_y .* log(a_3) + (1-modified_y) .* log(1-a_3) ),2),1) + regularize_term;



  % Back Propagation

    delta1 = 0;
    delta2 = 0;

    d3 = a_3 - modified_y; % Dimension = 4999x10
    d2 = (d3 * Theta2)(:, 2:end) .* sigmoidGradient(z_2); % Dimension = (4999x10 * 26x10')(:, 2:end) .* 4999x25 => 4999x25 .* 4999x25 = 4999x25

    delta1 = delta1 + (d2' * a_1); % Dimension = delta1 + (4999x25' * 4999x401) = 25x401 
    delta2 = delta2 + (d3' * a_2); % Dimension = delta2 + (4999x10' * 4999x26) = 10x26
        size(delta2)

    Theta1_grad = (1/m) * delta1; % Dimension = 25x401
    Theta2_grad = (1/m) * delta2; % Dimension = 10x26
        size(Theta2_grad)

    regularized_Theta1 = ((1/m) * Theta1);
    regularized_Theta2 = ((1/m) * Theta2);
            size(regularized_Theta2)

    Theta1_grad(:,2:end) = Theta1_grad(:,2:end) + regularized_Theta1(:,2:end);
    Theta2_grad(:,2:end) = Theta2_grad(:,2:end) + regularized_Theta2(:,2:end);

grad = [Theta1_grad(:); Theta2_grad(:)] ;    

endfunction
function a_3 = predict (Theta, X)
  X = [ones(size(X,1),1) X];
  Theta1 = reshape(Theta(1:10025), 25,401); % Dimension = 25x401
  Theta2 = reshape(Theta(10026:end), 10, 26); % Dimension = 26x10

  a_1 = X;
  z_2 = a_1 * Theta1'; % Dimension = 4999x25
  a_2 = [ones(size(X,1),1) sigmoid(z_2)]; % Dimension = 4999x26

  z_3 = a_2 * Theta2'; % Dimension = 4999x10
  sigmoid(z_3)
  [number, index] = max(sigmoid(z_3), [],2);
  a_3 = index;
endfunction
function val = sigmoid (z)
  val = 1 ./ (1+exp(-z));
endfunction
function val = sigmoidGradient (z)
  val = sigmoid(z) .* (1-sigmoid(z));
endfunction
sigmoidgradent.m

load('ex4data1.mat');


X = X(1:end-1, :);
y = y(1:end-1, :);

size(X) % Dimension = 4999x400
size(y) % Dimension = 4999x10

noOfFeatures = size(X,2) + 1;

%Theta1 = randInitializeWeights(25,noOfFeatures); % Dimension = 25x401
%Theta2 = randInitializeWeights(26,10); % Dimension = 26x10


Theta = [Theta1(:); Theta2(:)];

learning_rate = 0.01;

for iter = 1:200,
 [J, grad] =nnCostGrad(Theta, X, y);
  Theta = Theta - (learning_rate * grad);
  plot(iter,J);
  hold on;
endfor

predict(Theta, [0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   8.5606e-06  1.9404e-06  -0.00073744 -0.008134   -0.01861    -0.018741   -0.018757   -0.019096   -0.016404   -0.0037819  0.00033035  1.2766e-05  0   0   0   0   0   0   0   0.00011642  0.00012005  -0.014044   -0.028454   0.080383    0.26654 0.27385 0.27873 0.27429 0.22468 0.027756    -0.0070632  0.00023472  0   0   0   0   0   0   1.2834e-17  -0.00032629 -0.013865   0.081565    0.3828  0.85785 1.0011  0.96971 0.93093 1.0038  0.96416 0.44926 -0.0056041  -0.0037832  0   0   0   0   5.1062e-06  0.00043641  -0.0039551  -0.026854   0.10076 0.64203 1.0314  0.85097 0.54312 0.3426  0.26892 0.66837 1.0126  0.9038  0.10448 -0.016642   0   0   0   0   2.5988e-05  -0.0031061  0.0075246   0.17754 0.79289 0.96563 0.46317 0.069172    -0.003641   -0.041218   -0.05019    0.1561  0.90176 1.0475  0.15106 -0.021604   0   0   0   5.8701e-05  -0.00064093 -0.032331   0.2782  0.93672 1.0432  0.598   -0.0035941  -0.021675   -0.0048102  6.1657e-05  -0.012377   0.15548 0.91487 0.9204  0.10917 -0.017106   0   0   0.00015625  -0.00042772 -0.025147   0.13053 0.78166 1.0284  0.75714 0.28467 0.0048687   -0.0031869  0   0.00083649  -0.037075   0.45264 1.0318  0.53903 -0.0024374  -0.0048029  0   0   -0.00070364 -0.012726   0.16171 0.77987 1.0368  0.80449 0.16059 -0.013817   0.0021488   -0.00021262 0.00020425  -0.0068591  0.00043171  0.72068 0.84814 0.15138 -0.02284    0.00019897  0   0   -0.0094041  0.037452    0.69439 1.0284  1.0165  0.88049 0.39212 -0.017412   -0.0001201  5.5522e-05  -0.0022391  -0.027607   0.36865 0.93641 0.45901 -0.04247    0.0011736   1.8893e-05  0   0   -0.019351   0.13    0.97982 0.94186 0.77515 0.87363 0.21278 -0.017235   0   0.0010994   -0.026179   0.12287 0.83081 0.7265  0.052444    -0.0061897  0   0   0   0   -0.0093656  0.036835    0.69908 1.0029  0.6057  0.3273  -0.03221    -0.048305   -0.043407   -0.057515   0.095567    0.72651 0.69537 0.14711 -0.012005   -0.0003028  0   0   0   0   -0.00067657 -0.0065142  0.11734 0.42195 0.99321 0.88201 0.74576 0.72387 0.72334 0.72002 0.84532 0.83186 0.068883    -0.027777   0.00035914  7.1487e-05  0   0   0   0   0.00015319  0.00031735  -0.022917   -0.004144   0.38704 0.50458 0.77489 0.99004 1.0077  1.0085  0.73791 0.21546 -0.026962   0.0013251   0   0   0   0   0   0   0   0   0.00023637  -0.0022603  -0.025199   -0.037389   0.066212    0.29113 0.32306 0.30626 0.087607    -0.025058   0.00023744  0   0   0   0   0   0   0   0   0   0   0   6.2094e-18  0.00067262  -0.011315   -0.035464   -0.038821   -0.037108   -0.013352   0.00099096  4.8918e-05  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0])
function [J, grad] = nnCostGrad (Theta, X, Y)

  X = [ones(size(X,1),1) X]; % Dimension = 4999x401

  m = size(X,1);

  Theta1 = reshape(Theta(1:10025), 25,401); % Dimension = 25x401
  Theta2 = reshape(Theta(10026:end), 10, 26); % Dimension = 26x10


  % Forward Propagation

  a_1 = X;
  z_2 = a_1 * Theta1'; % Dimension = 4999x25
  a_2 = [ones(size(X,1),1) sigmoid(z_2)]; % Dimension = 4999x26

  z_3 = a_2 * Theta2'; % Dimension = 4999x10
  a_3 = sigmoid(z_3);

  modified_y = zeros(size(Y,1), 10);
  for i=1:size(Y,1),
    modified_y(i,Y(i)) = 1;
  endfor

  regularize_term = (1/(2*m)) *  ( sum(sum(Theta1(:,2:end).^2,2),1) + sum(sum(Theta2(:,2:end).^2,2),1) );
  J = (1/m) * sum(sum(- ( modified_y .* log(a_3) + (1-modified_y) .* log(1-a_3) ),2),1) + regularize_term;



  % Back Propagation

    delta1 = 0;
    delta2 = 0;

    d3 = a_3 - modified_y; % Dimension = 4999x10
    d2 = (d3 * Theta2)(:, 2:end) .* sigmoidGradient(z_2); % Dimension = (4999x10 * 26x10')(:, 2:end) .* 4999x25 => 4999x25 .* 4999x25 = 4999x25

    delta1 = delta1 + (d2' * a_1); % Dimension = delta1 + (4999x25' * 4999x401) = 25x401 
    delta2 = delta2 + (d3' * a_2); % Dimension = delta2 + (4999x10' * 4999x26) = 10x26
        size(delta2)

    Theta1_grad = (1/m) * delta1; % Dimension = 25x401
    Theta2_grad = (1/m) * delta2; % Dimension = 10x26
        size(Theta2_grad)

    regularized_Theta1 = ((1/m) * Theta1);
    regularized_Theta2 = ((1/m) * Theta2);
            size(regularized_Theta2)

    Theta1_grad(:,2:end) = Theta1_grad(:,2:end) + regularized_Theta1(:,2:end);
    Theta2_grad(:,2:end) = Theta2_grad(:,2:end) + regularized_Theta2(:,2:end);

grad = [Theta1_grad(:); Theta2_grad(:)] ;    

endfunction
function a_3 = predict (Theta, X)
  X = [ones(size(X,1),1) X];
  Theta1 = reshape(Theta(1:10025), 25,401); % Dimension = 25x401
  Theta2 = reshape(Theta(10026:end), 10, 26); % Dimension = 26x10

  a_1 = X;
  z_2 = a_1 * Theta1'; % Dimension = 4999x25
  a_2 = [ones(size(X,1),1) sigmoid(z_2)]; % Dimension = 4999x26

  z_3 = a_2 * Theta2'; % Dimension = 4999x10
  sigmoid(z_3)
  [number, index] = max(sigmoid(z_3), [],2);
  a_3 = index;
endfunction
function val = sigmoid (z)
  val = 1 ./ (1+exp(-z));
endfunction
function val = sigmoidGradient (z)
  val = sigmoid(z) .* (1-sigmoid(z));
endfunction
经过100次迭代后,成本为3.2709,预测值为10,应为1, 预测集是

我不知道我错在哪里。请帮忙。提前多谢


在这里查看完整的代码-

自从我做了那个练习后,我很高兴,但让我们看看我是否能帮上一点忙。一些随机的想法-还没有彻底检查代码(1)为什么你注释掉了随机初始化的限制因素。这正是学习陷入困境的原因之一。(2) 不确定您的predict.m函数。看起来您没有将sigmoid()应用于隐藏层偏移权重。在z3上也有一个虚假的sigmoid()运行,你什么都不做。(3) 您有一些硬编码的数组大小-从来都不好,可能会隐藏bug。有时可以使用size(),但如果函数SD紧密链接,则可能会作为参数传递以保持一致性。在将X传递给新函数(如nn)后,您似乎也在重复向X添加偏差输入-为什么不在加载时添加一次,并在将X作为参数传递时让它们通过呢。更容易遵循代码=>更容易调试(除了减少运行时间之外),如果与使用size()提取维度相结合,则可能会重复添加偏差输入。通过使用随函数一起传递的参数,您将更容易发现运行时错误。不确定这些是否有帮助-无烟枪!在
main.m
中没有超参数来控制正则化的强度(但实际上,在正则化开始工作之前,您可以将其设置为0),您的θ是未定义的,并且预先获取工作区中的任何值。如果没有您已经注释掉的随机化,则有可能引入高度冗余,或将您陷于局部最小值。