Machine learning 为什么神经网络没有';你好像不知道?(从头开始建造)
我现在正在上Andrew Ng的Coursera机器学习课程。我用从中获得的知识从头开始实现了一个神经网络。但无论我做什么,网络似乎都没有学到东西 数据集详细信息:我使用了MNIST手写图像数据库。(0-9个数字)。它有5000个训练数据。我已将前4999作为训练数据 该网络由三层组成。输入层(400个节点)、隐藏层(26个节点)、输出层(10个节点) 八度程序: Main.mMachine learning 为什么神经网络没有';你好像不知道?(从头开始建造),machine-learning,neural-network,octave,Machine Learning,Neural Network,Octave,我现在正在上Andrew Ng的Coursera机器学习课程。我用从中获得的知识从头开始实现了一个神经网络。但无论我做什么,网络似乎都没有学到东西 数据集详细信息:我使用了MNIST手写图像数据库。(0-9个数字)。它有5000个训练数据。我已将前4999作为训练数据 该网络由三层组成。输入层(400个节点)、隐藏层(26个节点)、输出层(10个节点) 八度程序: Main.m load('ex4data1.mat'); X = X(1:end-1, :); y = y(1:end-1, :
load('ex4data1.mat');
X = X(1:end-1, :);
y = y(1:end-1, :);
size(X) % Dimension = 4999x400
size(y) % Dimension = 4999x10
noOfFeatures = size(X,2) + 1;
%Theta1 = randInitializeWeights(25,noOfFeatures); % Dimension = 25x401
%Theta2 = randInitializeWeights(26,10); % Dimension = 26x10
Theta = [Theta1(:); Theta2(:)];
learning_rate = 0.01;
for iter = 1:200,
[J, grad] =nnCostGrad(Theta, X, y);
Theta = Theta - (learning_rate * grad);
plot(iter,J);
hold on;
endfor
predict(Theta, [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8.5606e-06 1.9404e-06 -0.00073744 -0.008134 -0.01861 -0.018741 -0.018757 -0.019096 -0.016404 -0.0037819 0.00033035 1.2766e-05 0 0 0 0 0 0 0 0.00011642 0.00012005 -0.014044 -0.028454 0.080383 0.26654 0.27385 0.27873 0.27429 0.22468 0.027756 -0.0070632 0.00023472 0 0 0 0 0 0 1.2834e-17 -0.00032629 -0.013865 0.081565 0.3828 0.85785 1.0011 0.96971 0.93093 1.0038 0.96416 0.44926 -0.0056041 -0.0037832 0 0 0 0 5.1062e-06 0.00043641 -0.0039551 -0.026854 0.10076 0.64203 1.0314 0.85097 0.54312 0.3426 0.26892 0.66837 1.0126 0.9038 0.10448 -0.016642 0 0 0 0 2.5988e-05 -0.0031061 0.0075246 0.17754 0.79289 0.96563 0.46317 0.069172 -0.003641 -0.041218 -0.05019 0.1561 0.90176 1.0475 0.15106 -0.021604 0 0 0 5.8701e-05 -0.00064093 -0.032331 0.2782 0.93672 1.0432 0.598 -0.0035941 -0.021675 -0.0048102 6.1657e-05 -0.012377 0.15548 0.91487 0.9204 0.10917 -0.017106 0 0 0.00015625 -0.00042772 -0.025147 0.13053 0.78166 1.0284 0.75714 0.28467 0.0048687 -0.0031869 0 0.00083649 -0.037075 0.45264 1.0318 0.53903 -0.0024374 -0.0048029 0 0 -0.00070364 -0.012726 0.16171 0.77987 1.0368 0.80449 0.16059 -0.013817 0.0021488 -0.00021262 0.00020425 -0.0068591 0.00043171 0.72068 0.84814 0.15138 -0.02284 0.00019897 0 0 -0.0094041 0.037452 0.69439 1.0284 1.0165 0.88049 0.39212 -0.017412 -0.0001201 5.5522e-05 -0.0022391 -0.027607 0.36865 0.93641 0.45901 -0.04247 0.0011736 1.8893e-05 0 0 -0.019351 0.13 0.97982 0.94186 0.77515 0.87363 0.21278 -0.017235 0 0.0010994 -0.026179 0.12287 0.83081 0.7265 0.052444 -0.0061897 0 0 0 0 -0.0093656 0.036835 0.69908 1.0029 0.6057 0.3273 -0.03221 -0.048305 -0.043407 -0.057515 0.095567 0.72651 0.69537 0.14711 -0.012005 -0.0003028 0 0 0 0 -0.00067657 -0.0065142 0.11734 0.42195 0.99321 0.88201 0.74576 0.72387 0.72334 0.72002 0.84532 0.83186 0.068883 -0.027777 0.00035914 7.1487e-05 0 0 0 0 0.00015319 0.00031735 -0.022917 -0.004144 0.38704 0.50458 0.77489 0.99004 1.0077 1.0085 0.73791 0.21546 -0.026962 0.0013251 0 0 0 0 0 0 0 0 0.00023637 -0.0022603 -0.025199 -0.037389 0.066212 0.29113 0.32306 0.30626 0.087607 -0.025058 0.00023744 0 0 0 0 0 0 0 0 0 0 0 6.2094e-18 0.00067262 -0.011315 -0.035464 -0.038821 -0.037108 -0.013352 0.00099096 4.8918e-05 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0])
function [J, grad] = nnCostGrad (Theta, X, Y)
X = [ones(size(X,1),1) X]; % Dimension = 4999x401
m = size(X,1);
Theta1 = reshape(Theta(1:10025), 25,401); % Dimension = 25x401
Theta2 = reshape(Theta(10026:end), 10, 26); % Dimension = 26x10
% Forward Propagation
a_1 = X;
z_2 = a_1 * Theta1'; % Dimension = 4999x25
a_2 = [ones(size(X,1),1) sigmoid(z_2)]; % Dimension = 4999x26
z_3 = a_2 * Theta2'; % Dimension = 4999x10
a_3 = sigmoid(z_3);
modified_y = zeros(size(Y,1), 10);
for i=1:size(Y,1),
modified_y(i,Y(i)) = 1;
endfor
regularize_term = (1/(2*m)) * ( sum(sum(Theta1(:,2:end).^2,2),1) + sum(sum(Theta2(:,2:end).^2,2),1) );
J = (1/m) * sum(sum(- ( modified_y .* log(a_3) + (1-modified_y) .* log(1-a_3) ),2),1) + regularize_term;
% Back Propagation
delta1 = 0;
delta2 = 0;
d3 = a_3 - modified_y; % Dimension = 4999x10
d2 = (d3 * Theta2)(:, 2:end) .* sigmoidGradient(z_2); % Dimension = (4999x10 * 26x10')(:, 2:end) .* 4999x25 => 4999x25 .* 4999x25 = 4999x25
delta1 = delta1 + (d2' * a_1); % Dimension = delta1 + (4999x25' * 4999x401) = 25x401
delta2 = delta2 + (d3' * a_2); % Dimension = delta2 + (4999x10' * 4999x26) = 10x26
size(delta2)
Theta1_grad = (1/m) * delta1; % Dimension = 25x401
Theta2_grad = (1/m) * delta2; % Dimension = 10x26
size(Theta2_grad)
regularized_Theta1 = ((1/m) * Theta1);
regularized_Theta2 = ((1/m) * Theta2);
size(regularized_Theta2)
Theta1_grad(:,2:end) = Theta1_grad(:,2:end) + regularized_Theta1(:,2:end);
Theta2_grad(:,2:end) = Theta2_grad(:,2:end) + regularized_Theta2(:,2:end);
grad = [Theta1_grad(:); Theta2_grad(:)] ;
endfunction
function a_3 = predict (Theta, X)
X = [ones(size(X,1),1) X];
Theta1 = reshape(Theta(1:10025), 25,401); % Dimension = 25x401
Theta2 = reshape(Theta(10026:end), 10, 26); % Dimension = 26x10
a_1 = X;
z_2 = a_1 * Theta1'; % Dimension = 4999x25
a_2 = [ones(size(X,1),1) sigmoid(z_2)]; % Dimension = 4999x26
z_3 = a_2 * Theta2'; % Dimension = 4999x10
sigmoid(z_3)
[number, index] = max(sigmoid(z_3), [],2);
a_3 = index;
endfunction
function val = sigmoid (z)
val = 1 ./ (1+exp(-z));
endfunction
function val = sigmoidGradient (z)
val = sigmoid(z) .* (1-sigmoid(z));
endfunction
nnCostGrad.m
load('ex4data1.mat');
X = X(1:end-1, :);
y = y(1:end-1, :);
size(X) % Dimension = 4999x400
size(y) % Dimension = 4999x10
noOfFeatures = size(X,2) + 1;
%Theta1 = randInitializeWeights(25,noOfFeatures); % Dimension = 25x401
%Theta2 = randInitializeWeights(26,10); % Dimension = 26x10
Theta = [Theta1(:); Theta2(:)];
learning_rate = 0.01;
for iter = 1:200,
[J, grad] =nnCostGrad(Theta, X, y);
Theta = Theta - (learning_rate * grad);
plot(iter,J);
hold on;
endfor
predict(Theta, [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8.5606e-06 1.9404e-06 -0.00073744 -0.008134 -0.01861 -0.018741 -0.018757 -0.019096 -0.016404 -0.0037819 0.00033035 1.2766e-05 0 0 0 0 0 0 0 0.00011642 0.00012005 -0.014044 -0.028454 0.080383 0.26654 0.27385 0.27873 0.27429 0.22468 0.027756 -0.0070632 0.00023472 0 0 0 0 0 0 1.2834e-17 -0.00032629 -0.013865 0.081565 0.3828 0.85785 1.0011 0.96971 0.93093 1.0038 0.96416 0.44926 -0.0056041 -0.0037832 0 0 0 0 5.1062e-06 0.00043641 -0.0039551 -0.026854 0.10076 0.64203 1.0314 0.85097 0.54312 0.3426 0.26892 0.66837 1.0126 0.9038 0.10448 -0.016642 0 0 0 0 2.5988e-05 -0.0031061 0.0075246 0.17754 0.79289 0.96563 0.46317 0.069172 -0.003641 -0.041218 -0.05019 0.1561 0.90176 1.0475 0.15106 -0.021604 0 0 0 5.8701e-05 -0.00064093 -0.032331 0.2782 0.93672 1.0432 0.598 -0.0035941 -0.021675 -0.0048102 6.1657e-05 -0.012377 0.15548 0.91487 0.9204 0.10917 -0.017106 0 0 0.00015625 -0.00042772 -0.025147 0.13053 0.78166 1.0284 0.75714 0.28467 0.0048687 -0.0031869 0 0.00083649 -0.037075 0.45264 1.0318 0.53903 -0.0024374 -0.0048029 0 0 -0.00070364 -0.012726 0.16171 0.77987 1.0368 0.80449 0.16059 -0.013817 0.0021488 -0.00021262 0.00020425 -0.0068591 0.00043171 0.72068 0.84814 0.15138 -0.02284 0.00019897 0 0 -0.0094041 0.037452 0.69439 1.0284 1.0165 0.88049 0.39212 -0.017412 -0.0001201 5.5522e-05 -0.0022391 -0.027607 0.36865 0.93641 0.45901 -0.04247 0.0011736 1.8893e-05 0 0 -0.019351 0.13 0.97982 0.94186 0.77515 0.87363 0.21278 -0.017235 0 0.0010994 -0.026179 0.12287 0.83081 0.7265 0.052444 -0.0061897 0 0 0 0 -0.0093656 0.036835 0.69908 1.0029 0.6057 0.3273 -0.03221 -0.048305 -0.043407 -0.057515 0.095567 0.72651 0.69537 0.14711 -0.012005 -0.0003028 0 0 0 0 -0.00067657 -0.0065142 0.11734 0.42195 0.99321 0.88201 0.74576 0.72387 0.72334 0.72002 0.84532 0.83186 0.068883 -0.027777 0.00035914 7.1487e-05 0 0 0 0 0.00015319 0.00031735 -0.022917 -0.004144 0.38704 0.50458 0.77489 0.99004 1.0077 1.0085 0.73791 0.21546 -0.026962 0.0013251 0 0 0 0 0 0 0 0 0.00023637 -0.0022603 -0.025199 -0.037389 0.066212 0.29113 0.32306 0.30626 0.087607 -0.025058 0.00023744 0 0 0 0 0 0 0 0 0 0 0 6.2094e-18 0.00067262 -0.011315 -0.035464 -0.038821 -0.037108 -0.013352 0.00099096 4.8918e-05 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0])
function [J, grad] = nnCostGrad (Theta, X, Y)
X = [ones(size(X,1),1) X]; % Dimension = 4999x401
m = size(X,1);
Theta1 = reshape(Theta(1:10025), 25,401); % Dimension = 25x401
Theta2 = reshape(Theta(10026:end), 10, 26); % Dimension = 26x10
% Forward Propagation
a_1 = X;
z_2 = a_1 * Theta1'; % Dimension = 4999x25
a_2 = [ones(size(X,1),1) sigmoid(z_2)]; % Dimension = 4999x26
z_3 = a_2 * Theta2'; % Dimension = 4999x10
a_3 = sigmoid(z_3);
modified_y = zeros(size(Y,1), 10);
for i=1:size(Y,1),
modified_y(i,Y(i)) = 1;
endfor
regularize_term = (1/(2*m)) * ( sum(sum(Theta1(:,2:end).^2,2),1) + sum(sum(Theta2(:,2:end).^2,2),1) );
J = (1/m) * sum(sum(- ( modified_y .* log(a_3) + (1-modified_y) .* log(1-a_3) ),2),1) + regularize_term;
% Back Propagation
delta1 = 0;
delta2 = 0;
d3 = a_3 - modified_y; % Dimension = 4999x10
d2 = (d3 * Theta2)(:, 2:end) .* sigmoidGradient(z_2); % Dimension = (4999x10 * 26x10')(:, 2:end) .* 4999x25 => 4999x25 .* 4999x25 = 4999x25
delta1 = delta1 + (d2' * a_1); % Dimension = delta1 + (4999x25' * 4999x401) = 25x401
delta2 = delta2 + (d3' * a_2); % Dimension = delta2 + (4999x10' * 4999x26) = 10x26
size(delta2)
Theta1_grad = (1/m) * delta1; % Dimension = 25x401
Theta2_grad = (1/m) * delta2; % Dimension = 10x26
size(Theta2_grad)
regularized_Theta1 = ((1/m) * Theta1);
regularized_Theta2 = ((1/m) * Theta2);
size(regularized_Theta2)
Theta1_grad(:,2:end) = Theta1_grad(:,2:end) + regularized_Theta1(:,2:end);
Theta2_grad(:,2:end) = Theta2_grad(:,2:end) + regularized_Theta2(:,2:end);
grad = [Theta1_grad(:); Theta2_grad(:)] ;
endfunction
function a_3 = predict (Theta, X)
X = [ones(size(X,1),1) X];
Theta1 = reshape(Theta(1:10025), 25,401); % Dimension = 25x401
Theta2 = reshape(Theta(10026:end), 10, 26); % Dimension = 26x10
a_1 = X;
z_2 = a_1 * Theta1'; % Dimension = 4999x25
a_2 = [ones(size(X,1),1) sigmoid(z_2)]; % Dimension = 4999x26
z_3 = a_2 * Theta2'; % Dimension = 4999x10
sigmoid(z_3)
[number, index] = max(sigmoid(z_3), [],2);
a_3 = index;
endfunction
function val = sigmoid (z)
val = 1 ./ (1+exp(-z));
endfunction
function val = sigmoidGradient (z)
val = sigmoid(z) .* (1-sigmoid(z));
endfunction
预测.m
load('ex4data1.mat');
X = X(1:end-1, :);
y = y(1:end-1, :);
size(X) % Dimension = 4999x400
size(y) % Dimension = 4999x10
noOfFeatures = size(X,2) + 1;
%Theta1 = randInitializeWeights(25,noOfFeatures); % Dimension = 25x401
%Theta2 = randInitializeWeights(26,10); % Dimension = 26x10
Theta = [Theta1(:); Theta2(:)];
learning_rate = 0.01;
for iter = 1:200,
[J, grad] =nnCostGrad(Theta, X, y);
Theta = Theta - (learning_rate * grad);
plot(iter,J);
hold on;
endfor
predict(Theta, [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8.5606e-06 1.9404e-06 -0.00073744 -0.008134 -0.01861 -0.018741 -0.018757 -0.019096 -0.016404 -0.0037819 0.00033035 1.2766e-05 0 0 0 0 0 0 0 0.00011642 0.00012005 -0.014044 -0.028454 0.080383 0.26654 0.27385 0.27873 0.27429 0.22468 0.027756 -0.0070632 0.00023472 0 0 0 0 0 0 1.2834e-17 -0.00032629 -0.013865 0.081565 0.3828 0.85785 1.0011 0.96971 0.93093 1.0038 0.96416 0.44926 -0.0056041 -0.0037832 0 0 0 0 5.1062e-06 0.00043641 -0.0039551 -0.026854 0.10076 0.64203 1.0314 0.85097 0.54312 0.3426 0.26892 0.66837 1.0126 0.9038 0.10448 -0.016642 0 0 0 0 2.5988e-05 -0.0031061 0.0075246 0.17754 0.79289 0.96563 0.46317 0.069172 -0.003641 -0.041218 -0.05019 0.1561 0.90176 1.0475 0.15106 -0.021604 0 0 0 5.8701e-05 -0.00064093 -0.032331 0.2782 0.93672 1.0432 0.598 -0.0035941 -0.021675 -0.0048102 6.1657e-05 -0.012377 0.15548 0.91487 0.9204 0.10917 -0.017106 0 0 0.00015625 -0.00042772 -0.025147 0.13053 0.78166 1.0284 0.75714 0.28467 0.0048687 -0.0031869 0 0.00083649 -0.037075 0.45264 1.0318 0.53903 -0.0024374 -0.0048029 0 0 -0.00070364 -0.012726 0.16171 0.77987 1.0368 0.80449 0.16059 -0.013817 0.0021488 -0.00021262 0.00020425 -0.0068591 0.00043171 0.72068 0.84814 0.15138 -0.02284 0.00019897 0 0 -0.0094041 0.037452 0.69439 1.0284 1.0165 0.88049 0.39212 -0.017412 -0.0001201 5.5522e-05 -0.0022391 -0.027607 0.36865 0.93641 0.45901 -0.04247 0.0011736 1.8893e-05 0 0 -0.019351 0.13 0.97982 0.94186 0.77515 0.87363 0.21278 -0.017235 0 0.0010994 -0.026179 0.12287 0.83081 0.7265 0.052444 -0.0061897 0 0 0 0 -0.0093656 0.036835 0.69908 1.0029 0.6057 0.3273 -0.03221 -0.048305 -0.043407 -0.057515 0.095567 0.72651 0.69537 0.14711 -0.012005 -0.0003028 0 0 0 0 -0.00067657 -0.0065142 0.11734 0.42195 0.99321 0.88201 0.74576 0.72387 0.72334 0.72002 0.84532 0.83186 0.068883 -0.027777 0.00035914 7.1487e-05 0 0 0 0 0.00015319 0.00031735 -0.022917 -0.004144 0.38704 0.50458 0.77489 0.99004 1.0077 1.0085 0.73791 0.21546 -0.026962 0.0013251 0 0 0 0 0 0 0 0 0.00023637 -0.0022603 -0.025199 -0.037389 0.066212 0.29113 0.32306 0.30626 0.087607 -0.025058 0.00023744 0 0 0 0 0 0 0 0 0 0 0 6.2094e-18 0.00067262 -0.011315 -0.035464 -0.038821 -0.037108 -0.013352 0.00099096 4.8918e-05 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0])
function [J, grad] = nnCostGrad (Theta, X, Y)
X = [ones(size(X,1),1) X]; % Dimension = 4999x401
m = size(X,1);
Theta1 = reshape(Theta(1:10025), 25,401); % Dimension = 25x401
Theta2 = reshape(Theta(10026:end), 10, 26); % Dimension = 26x10
% Forward Propagation
a_1 = X;
z_2 = a_1 * Theta1'; % Dimension = 4999x25
a_2 = [ones(size(X,1),1) sigmoid(z_2)]; % Dimension = 4999x26
z_3 = a_2 * Theta2'; % Dimension = 4999x10
a_3 = sigmoid(z_3);
modified_y = zeros(size(Y,1), 10);
for i=1:size(Y,1),
modified_y(i,Y(i)) = 1;
endfor
regularize_term = (1/(2*m)) * ( sum(sum(Theta1(:,2:end).^2,2),1) + sum(sum(Theta2(:,2:end).^2,2),1) );
J = (1/m) * sum(sum(- ( modified_y .* log(a_3) + (1-modified_y) .* log(1-a_3) ),2),1) + regularize_term;
% Back Propagation
delta1 = 0;
delta2 = 0;
d3 = a_3 - modified_y; % Dimension = 4999x10
d2 = (d3 * Theta2)(:, 2:end) .* sigmoidGradient(z_2); % Dimension = (4999x10 * 26x10')(:, 2:end) .* 4999x25 => 4999x25 .* 4999x25 = 4999x25
delta1 = delta1 + (d2' * a_1); % Dimension = delta1 + (4999x25' * 4999x401) = 25x401
delta2 = delta2 + (d3' * a_2); % Dimension = delta2 + (4999x10' * 4999x26) = 10x26
size(delta2)
Theta1_grad = (1/m) * delta1; % Dimension = 25x401
Theta2_grad = (1/m) * delta2; % Dimension = 10x26
size(Theta2_grad)
regularized_Theta1 = ((1/m) * Theta1);
regularized_Theta2 = ((1/m) * Theta2);
size(regularized_Theta2)
Theta1_grad(:,2:end) = Theta1_grad(:,2:end) + regularized_Theta1(:,2:end);
Theta2_grad(:,2:end) = Theta2_grad(:,2:end) + regularized_Theta2(:,2:end);
grad = [Theta1_grad(:); Theta2_grad(:)] ;
endfunction
function a_3 = predict (Theta, X)
X = [ones(size(X,1),1) X];
Theta1 = reshape(Theta(1:10025), 25,401); % Dimension = 25x401
Theta2 = reshape(Theta(10026:end), 10, 26); % Dimension = 26x10
a_1 = X;
z_2 = a_1 * Theta1'; % Dimension = 4999x25
a_2 = [ones(size(X,1),1) sigmoid(z_2)]; % Dimension = 4999x26
z_3 = a_2 * Theta2'; % Dimension = 4999x10
sigmoid(z_3)
[number, index] = max(sigmoid(z_3), [],2);
a_3 = index;
endfunction
function val = sigmoid (z)
val = 1 ./ (1+exp(-z));
endfunction
function val = sigmoidGradient (z)
val = sigmoid(z) .* (1-sigmoid(z));
endfunction
sigmoid.m
load('ex4data1.mat');
X = X(1:end-1, :);
y = y(1:end-1, :);
size(X) % Dimension = 4999x400
size(y) % Dimension = 4999x10
noOfFeatures = size(X,2) + 1;
%Theta1 = randInitializeWeights(25,noOfFeatures); % Dimension = 25x401
%Theta2 = randInitializeWeights(26,10); % Dimension = 26x10
Theta = [Theta1(:); Theta2(:)];
learning_rate = 0.01;
for iter = 1:200,
[J, grad] =nnCostGrad(Theta, X, y);
Theta = Theta - (learning_rate * grad);
plot(iter,J);
hold on;
endfor
predict(Theta, [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8.5606e-06 1.9404e-06 -0.00073744 -0.008134 -0.01861 -0.018741 -0.018757 -0.019096 -0.016404 -0.0037819 0.00033035 1.2766e-05 0 0 0 0 0 0 0 0.00011642 0.00012005 -0.014044 -0.028454 0.080383 0.26654 0.27385 0.27873 0.27429 0.22468 0.027756 -0.0070632 0.00023472 0 0 0 0 0 0 1.2834e-17 -0.00032629 -0.013865 0.081565 0.3828 0.85785 1.0011 0.96971 0.93093 1.0038 0.96416 0.44926 -0.0056041 -0.0037832 0 0 0 0 5.1062e-06 0.00043641 -0.0039551 -0.026854 0.10076 0.64203 1.0314 0.85097 0.54312 0.3426 0.26892 0.66837 1.0126 0.9038 0.10448 -0.016642 0 0 0 0 2.5988e-05 -0.0031061 0.0075246 0.17754 0.79289 0.96563 0.46317 0.069172 -0.003641 -0.041218 -0.05019 0.1561 0.90176 1.0475 0.15106 -0.021604 0 0 0 5.8701e-05 -0.00064093 -0.032331 0.2782 0.93672 1.0432 0.598 -0.0035941 -0.021675 -0.0048102 6.1657e-05 -0.012377 0.15548 0.91487 0.9204 0.10917 -0.017106 0 0 0.00015625 -0.00042772 -0.025147 0.13053 0.78166 1.0284 0.75714 0.28467 0.0048687 -0.0031869 0 0.00083649 -0.037075 0.45264 1.0318 0.53903 -0.0024374 -0.0048029 0 0 -0.00070364 -0.012726 0.16171 0.77987 1.0368 0.80449 0.16059 -0.013817 0.0021488 -0.00021262 0.00020425 -0.0068591 0.00043171 0.72068 0.84814 0.15138 -0.02284 0.00019897 0 0 -0.0094041 0.037452 0.69439 1.0284 1.0165 0.88049 0.39212 -0.017412 -0.0001201 5.5522e-05 -0.0022391 -0.027607 0.36865 0.93641 0.45901 -0.04247 0.0011736 1.8893e-05 0 0 -0.019351 0.13 0.97982 0.94186 0.77515 0.87363 0.21278 -0.017235 0 0.0010994 -0.026179 0.12287 0.83081 0.7265 0.052444 -0.0061897 0 0 0 0 -0.0093656 0.036835 0.69908 1.0029 0.6057 0.3273 -0.03221 -0.048305 -0.043407 -0.057515 0.095567 0.72651 0.69537 0.14711 -0.012005 -0.0003028 0 0 0 0 -0.00067657 -0.0065142 0.11734 0.42195 0.99321 0.88201 0.74576 0.72387 0.72334 0.72002 0.84532 0.83186 0.068883 -0.027777 0.00035914 7.1487e-05 0 0 0 0 0.00015319 0.00031735 -0.022917 -0.004144 0.38704 0.50458 0.77489 0.99004 1.0077 1.0085 0.73791 0.21546 -0.026962 0.0013251 0 0 0 0 0 0 0 0 0.00023637 -0.0022603 -0.025199 -0.037389 0.066212 0.29113 0.32306 0.30626 0.087607 -0.025058 0.00023744 0 0 0 0 0 0 0 0 0 0 0 6.2094e-18 0.00067262 -0.011315 -0.035464 -0.038821 -0.037108 -0.013352 0.00099096 4.8918e-05 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0])
function [J, grad] = nnCostGrad (Theta, X, Y)
X = [ones(size(X,1),1) X]; % Dimension = 4999x401
m = size(X,1);
Theta1 = reshape(Theta(1:10025), 25,401); % Dimension = 25x401
Theta2 = reshape(Theta(10026:end), 10, 26); % Dimension = 26x10
% Forward Propagation
a_1 = X;
z_2 = a_1 * Theta1'; % Dimension = 4999x25
a_2 = [ones(size(X,1),1) sigmoid(z_2)]; % Dimension = 4999x26
z_3 = a_2 * Theta2'; % Dimension = 4999x10
a_3 = sigmoid(z_3);
modified_y = zeros(size(Y,1), 10);
for i=1:size(Y,1),
modified_y(i,Y(i)) = 1;
endfor
regularize_term = (1/(2*m)) * ( sum(sum(Theta1(:,2:end).^2,2),1) + sum(sum(Theta2(:,2:end).^2,2),1) );
J = (1/m) * sum(sum(- ( modified_y .* log(a_3) + (1-modified_y) .* log(1-a_3) ),2),1) + regularize_term;
% Back Propagation
delta1 = 0;
delta2 = 0;
d3 = a_3 - modified_y; % Dimension = 4999x10
d2 = (d3 * Theta2)(:, 2:end) .* sigmoidGradient(z_2); % Dimension = (4999x10 * 26x10')(:, 2:end) .* 4999x25 => 4999x25 .* 4999x25 = 4999x25
delta1 = delta1 + (d2' * a_1); % Dimension = delta1 + (4999x25' * 4999x401) = 25x401
delta2 = delta2 + (d3' * a_2); % Dimension = delta2 + (4999x10' * 4999x26) = 10x26
size(delta2)
Theta1_grad = (1/m) * delta1; % Dimension = 25x401
Theta2_grad = (1/m) * delta2; % Dimension = 10x26
size(Theta2_grad)
regularized_Theta1 = ((1/m) * Theta1);
regularized_Theta2 = ((1/m) * Theta2);
size(regularized_Theta2)
Theta1_grad(:,2:end) = Theta1_grad(:,2:end) + regularized_Theta1(:,2:end);
Theta2_grad(:,2:end) = Theta2_grad(:,2:end) + regularized_Theta2(:,2:end);
grad = [Theta1_grad(:); Theta2_grad(:)] ;
endfunction
function a_3 = predict (Theta, X)
X = [ones(size(X,1),1) X];
Theta1 = reshape(Theta(1:10025), 25,401); % Dimension = 25x401
Theta2 = reshape(Theta(10026:end), 10, 26); % Dimension = 26x10
a_1 = X;
z_2 = a_1 * Theta1'; % Dimension = 4999x25
a_2 = [ones(size(X,1),1) sigmoid(z_2)]; % Dimension = 4999x26
z_3 = a_2 * Theta2'; % Dimension = 4999x10
sigmoid(z_3)
[number, index] = max(sigmoid(z_3), [],2);
a_3 = index;
endfunction
function val = sigmoid (z)
val = 1 ./ (1+exp(-z));
endfunction
function val = sigmoidGradient (z)
val = sigmoid(z) .* (1-sigmoid(z));
endfunction
sigmoidgradent.m
load('ex4data1.mat');
X = X(1:end-1, :);
y = y(1:end-1, :);
size(X) % Dimension = 4999x400
size(y) % Dimension = 4999x10
noOfFeatures = size(X,2) + 1;
%Theta1 = randInitializeWeights(25,noOfFeatures); % Dimension = 25x401
%Theta2 = randInitializeWeights(26,10); % Dimension = 26x10
Theta = [Theta1(:); Theta2(:)];
learning_rate = 0.01;
for iter = 1:200,
[J, grad] =nnCostGrad(Theta, X, y);
Theta = Theta - (learning_rate * grad);
plot(iter,J);
hold on;
endfor
predict(Theta, [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8.5606e-06 1.9404e-06 -0.00073744 -0.008134 -0.01861 -0.018741 -0.018757 -0.019096 -0.016404 -0.0037819 0.00033035 1.2766e-05 0 0 0 0 0 0 0 0.00011642 0.00012005 -0.014044 -0.028454 0.080383 0.26654 0.27385 0.27873 0.27429 0.22468 0.027756 -0.0070632 0.00023472 0 0 0 0 0 0 1.2834e-17 -0.00032629 -0.013865 0.081565 0.3828 0.85785 1.0011 0.96971 0.93093 1.0038 0.96416 0.44926 -0.0056041 -0.0037832 0 0 0 0 5.1062e-06 0.00043641 -0.0039551 -0.026854 0.10076 0.64203 1.0314 0.85097 0.54312 0.3426 0.26892 0.66837 1.0126 0.9038 0.10448 -0.016642 0 0 0 0 2.5988e-05 -0.0031061 0.0075246 0.17754 0.79289 0.96563 0.46317 0.069172 -0.003641 -0.041218 -0.05019 0.1561 0.90176 1.0475 0.15106 -0.021604 0 0 0 5.8701e-05 -0.00064093 -0.032331 0.2782 0.93672 1.0432 0.598 -0.0035941 -0.021675 -0.0048102 6.1657e-05 -0.012377 0.15548 0.91487 0.9204 0.10917 -0.017106 0 0 0.00015625 -0.00042772 -0.025147 0.13053 0.78166 1.0284 0.75714 0.28467 0.0048687 -0.0031869 0 0.00083649 -0.037075 0.45264 1.0318 0.53903 -0.0024374 -0.0048029 0 0 -0.00070364 -0.012726 0.16171 0.77987 1.0368 0.80449 0.16059 -0.013817 0.0021488 -0.00021262 0.00020425 -0.0068591 0.00043171 0.72068 0.84814 0.15138 -0.02284 0.00019897 0 0 -0.0094041 0.037452 0.69439 1.0284 1.0165 0.88049 0.39212 -0.017412 -0.0001201 5.5522e-05 -0.0022391 -0.027607 0.36865 0.93641 0.45901 -0.04247 0.0011736 1.8893e-05 0 0 -0.019351 0.13 0.97982 0.94186 0.77515 0.87363 0.21278 -0.017235 0 0.0010994 -0.026179 0.12287 0.83081 0.7265 0.052444 -0.0061897 0 0 0 0 -0.0093656 0.036835 0.69908 1.0029 0.6057 0.3273 -0.03221 -0.048305 -0.043407 -0.057515 0.095567 0.72651 0.69537 0.14711 -0.012005 -0.0003028 0 0 0 0 -0.00067657 -0.0065142 0.11734 0.42195 0.99321 0.88201 0.74576 0.72387 0.72334 0.72002 0.84532 0.83186 0.068883 -0.027777 0.00035914 7.1487e-05 0 0 0 0 0.00015319 0.00031735 -0.022917 -0.004144 0.38704 0.50458 0.77489 0.99004 1.0077 1.0085 0.73791 0.21546 -0.026962 0.0013251 0 0 0 0 0 0 0 0 0.00023637 -0.0022603 -0.025199 -0.037389 0.066212 0.29113 0.32306 0.30626 0.087607 -0.025058 0.00023744 0 0 0 0 0 0 0 0 0 0 0 6.2094e-18 0.00067262 -0.011315 -0.035464 -0.038821 -0.037108 -0.013352 0.00099096 4.8918e-05 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0])
function [J, grad] = nnCostGrad (Theta, X, Y)
X = [ones(size(X,1),1) X]; % Dimension = 4999x401
m = size(X,1);
Theta1 = reshape(Theta(1:10025), 25,401); % Dimension = 25x401
Theta2 = reshape(Theta(10026:end), 10, 26); % Dimension = 26x10
% Forward Propagation
a_1 = X;
z_2 = a_1 * Theta1'; % Dimension = 4999x25
a_2 = [ones(size(X,1),1) sigmoid(z_2)]; % Dimension = 4999x26
z_3 = a_2 * Theta2'; % Dimension = 4999x10
a_3 = sigmoid(z_3);
modified_y = zeros(size(Y,1), 10);
for i=1:size(Y,1),
modified_y(i,Y(i)) = 1;
endfor
regularize_term = (1/(2*m)) * ( sum(sum(Theta1(:,2:end).^2,2),1) + sum(sum(Theta2(:,2:end).^2,2),1) );
J = (1/m) * sum(sum(- ( modified_y .* log(a_3) + (1-modified_y) .* log(1-a_3) ),2),1) + regularize_term;
% Back Propagation
delta1 = 0;
delta2 = 0;
d3 = a_3 - modified_y; % Dimension = 4999x10
d2 = (d3 * Theta2)(:, 2:end) .* sigmoidGradient(z_2); % Dimension = (4999x10 * 26x10')(:, 2:end) .* 4999x25 => 4999x25 .* 4999x25 = 4999x25
delta1 = delta1 + (d2' * a_1); % Dimension = delta1 + (4999x25' * 4999x401) = 25x401
delta2 = delta2 + (d3' * a_2); % Dimension = delta2 + (4999x10' * 4999x26) = 10x26
size(delta2)
Theta1_grad = (1/m) * delta1; % Dimension = 25x401
Theta2_grad = (1/m) * delta2; % Dimension = 10x26
size(Theta2_grad)
regularized_Theta1 = ((1/m) * Theta1);
regularized_Theta2 = ((1/m) * Theta2);
size(regularized_Theta2)
Theta1_grad(:,2:end) = Theta1_grad(:,2:end) + regularized_Theta1(:,2:end);
Theta2_grad(:,2:end) = Theta2_grad(:,2:end) + regularized_Theta2(:,2:end);
grad = [Theta1_grad(:); Theta2_grad(:)] ;
endfunction
function a_3 = predict (Theta, X)
X = [ones(size(X,1),1) X];
Theta1 = reshape(Theta(1:10025), 25,401); % Dimension = 25x401
Theta2 = reshape(Theta(10026:end), 10, 26); % Dimension = 26x10
a_1 = X;
z_2 = a_1 * Theta1'; % Dimension = 4999x25
a_2 = [ones(size(X,1),1) sigmoid(z_2)]; % Dimension = 4999x26
z_3 = a_2 * Theta2'; % Dimension = 4999x10
sigmoid(z_3)
[number, index] = max(sigmoid(z_3), [],2);
a_3 = index;
endfunction
function val = sigmoid (z)
val = 1 ./ (1+exp(-z));
endfunction
function val = sigmoidGradient (z)
val = sigmoid(z) .* (1-sigmoid(z));
endfunction
经过100次迭代后,成本为3.2709,预测值为10,应为1,
预测集是
我不知道我错在哪里。请帮忙。提前多谢
在这里查看完整的代码-自从我做了那个练习后,我很高兴,但让我们看看我是否能帮上一点忙。一些随机的想法-还没有彻底检查代码(1)为什么你注释掉了随机初始化的限制因素。这正是学习陷入困境的原因之一。(2) 不确定您的predict.m函数。看起来您没有将sigmoid()应用于隐藏层偏移权重。在z3上也有一个虚假的sigmoid()运行,你什么都不做。(3) 您有一些硬编码的数组大小-从来都不好,可能会隐藏bug。有时可以使用size(),但如果函数SD紧密链接,则可能会作为参数传递以保持一致性。在将X传递给新函数(如nn)后,您似乎也在重复向X添加偏差输入-为什么不在加载时添加一次,并在将X作为参数传递时让它们通过呢。更容易遵循代码=>更容易调试(除了减少运行时间之外),如果与使用size()提取维度相结合,则可能会重复添加偏差输入。通过使用随函数一起传递的参数,您将更容易发现运行时错误。不确定这些是否有帮助-无烟枪!在
main.m
中没有超参数来控制正则化的强度(但实际上,在正则化开始工作之前,您可以将其设置为0),您的θ是未定义的,并且预先获取工作区中的任何值。如果没有您已经注释掉的随机化,则有可能引入高度冗余,或将您陷于局部最小值。