Matlab 做安德鲁·吴';无fminunc的s逻辑回归分析

Matlab 做安德鲁·吴';无fminunc的s逻辑回归分析,matlab,machine-learning,logistic-regression,Matlab,Machine Learning,Logistic Regression,我一直在努力完成Andrew Ng的机器学习课程,现在我正在学习逻辑回归。我试图在不使用MATLAB函数fminunc的情况下发现参数并计算成本。然而,我并没有得到其他学生发布的正确结果,他们使用fminunc完成了作业。具体来说,我的问题是: 参数theta不正确 我的费用似乎在增加 我的成本向量中有许多NaNs(我只是创建一个成本向量来跟踪) 我试图通过梯度下降来发现这些参数,因为我是如何理解这些内容的。然而,我的实现似乎仍然给了我错误的结果 dataset = load('dataSt

我一直在努力完成Andrew Ng的机器学习课程,现在我正在学习逻辑回归。我试图在不使用MATLAB函数
fminunc
的情况下发现参数并计算成本。然而,我并没有得到其他学生发布的正确结果,他们使用
fminunc
完成了作业。具体来说,我的问题是:

  • 参数
    theta
    不正确
  • 我的费用似乎在增加
  • 我的成本向量中有许多
    NaN
    s(我只是创建一个成本向量来跟踪)
我试图通过梯度下降来发现这些参数,因为我是如何理解这些内容的。然而,我的实现似乎仍然给了我错误的结果

dataset = load('dataStuds.txt');
x = dataset(:,1:end-1);
y = dataset(:,end);
m = length(x);

% Padding the the 1's (intercept term, the call it?)
x = [ones(length(x),1), x];
thetas = zeros(size(x,2),1);

% Setting the learning rate to 0.1
alpha = 0.1;


for i = 1:100000

    % theta transpose x (tho why in MATLAB it needs to be done the other way
    % round? :) 
    ttrx = x * thetas;
    % the hypothesis function h_x = g(z) = sigmoid(-z)
    h_x = 1 ./ (1 + exp(-ttrx));

    error = h_x - y;

    % the gradient (aka the derivative of J(\theta) aka the derivative
    % term)

    for j = 1:length(thetas)
        gradient = 1/m * (h_x - y)' * x(:,j);
        % Updating the parameters theta
        thetas(j) =  thetas(j) - alpha * gradient;
    end

    % Calculating the cost, just to keep track...
    cost(i) = 1/m * ( -y' * log(h_x) - (1-y)' * log(1-h_x) );
end

% Displaying the final theta's that I obtained
thetas
我得到的参数θ是:

thetas =

-482.8509
3.7457
2.6976
下面的结果来自我下载的一个示例,但作者使用了fminunc

Cost at theta found by fminunc: 0.203506
theta: 
-24.932760 
0.204406 
0.199616 
数据:

34.6236596245170    78.0246928153624    0
30.2867107682261    43.8949975240010    0
35.8474087699387    72.9021980270836    0
60.1825993862098    86.3085520954683    1
79.0327360507101    75.3443764369103    1
45.0832774766834    56.3163717815305    0
61.1066645368477    96.5114258848962    1
75.0247455673889    46.5540135411654    1
76.0987867022626    87.4205697192680    1
84.4328199612004    43.5333933107211    1
95.8615550709357    38.2252780579509    0
75.0136583895825    30.6032632342801    0
82.3070533739948    76.4819633023560    1
69.3645887597094    97.7186919618861    1
39.5383391436722    76.0368108511588    0
53.9710521485623    89.2073501375021    1
69.0701440628303    52.7404697301677    1
67.9468554771162    46.6785741067313    0
70.6615095549944    92.9271378936483    1
76.9787837274750    47.5759636497553    1
67.3720275457088    42.8384383202918    0
89.6767757507208    65.7993659274524    1
50.5347882898830    48.8558115276421    0
34.2120609778679    44.2095285986629    0
77.9240914545704    68.9723599933059    1
62.2710136700463    69.9544579544759    1
80.1901807509566    44.8216289321835    1
93.1143887974420    38.8006703371321    0
61.8302060231260    50.2561078924462    0
38.7858037967942    64.9956809553958    0
61.3792894474250    72.8078873131710    1
85.4045193941165    57.0519839762712    1
52.1079797319398    63.1276237688172    0
52.0454047683183    69.4328601204522    1
40.2368937354511    71.1677480218488    0
54.6351055542482    52.2138858806112    0
33.9155001090689    98.8694357422061    0
64.1769888749449    80.9080605867082    1
74.7892529594154    41.5734152282443    0
34.1836400264419    75.2377203360134    0
83.9023936624916    56.3080462160533    1
51.5477202690618    46.8562902634998    0
94.4433677691785    65.5689216055905    1
82.3687537571392    40.6182551597062    0
51.0477517712887    45.8227014577600    0
62.2226757612019    52.0609919483668    0
77.1930349260136    70.4582000018096    1
97.7715992800023    86.7278223300282    1
62.0730637966765    96.7688241241398    1
91.5649744980744    88.6962925454660    1
79.9448179406693    74.1631193504376    1
99.2725269292572    60.9990309984499    1
90.5467141139985    43.3906018065003    1
34.5245138532001    60.3963424583717    0
50.2864961189907    49.8045388132306    0
49.5866772163203    59.8089509945327    0
97.6456339600777    68.8615727242060    1
32.5772001680931    95.5985476138788    0
74.2486913672160    69.8245712265719    1
71.7964620586338    78.4535622451505    1
75.3956114656803    85.7599366733162    1
35.2861128152619    47.0205139472342    0
56.2538174971162    39.2614725105802    0
30.0588224466980    49.5929738672369    0
44.6682617248089    66.4500861455891    0
66.5608944724295    41.0920980793697    0
40.4575509837516    97.5351854890994    1
49.0725632190884    51.8832118207397    0
80.2795740146700    92.1160608134408    1
66.7467185694404    60.9913940274099    1
32.7228330406032    43.3071730643006    0
64.0393204150601    78.0316880201823    1
72.3464942257992    96.2275929676140    1
60.4578857391896    73.0949980975804    1
58.8409562172680    75.8584483127904    1
99.8278577969213    72.3692519338389    1
47.2642691084817    88.4758649955978    1
50.4581598028599    75.8098595298246    1
60.4555562927153    42.5084094357222    0
82.2266615778557    42.7198785371646    0
88.9138964166533    69.8037888983547    1
94.8345067243020    45.6943068025075    1
67.3192574691753    66.5893531774792    1
57.2387063156986    59.5142819801296    1
80.3667560017127    90.9601478974695    1
68.4685217859111    85.5943071045201    1
42.0754545384731    78.8447860014804    0
75.4777020053391    90.4245389975396    1
78.6354243489802    96.6474271688564    1
52.3480039879411    60.7695052560259    0
94.0943311251679    77.1591050907389    1
90.4485509709636    87.5087917648470    1
55.4821611406959    35.5707034722887    0
74.4926924184304    84.8451368493014    1
89.8458067072098    45.3582836109166    1
83.4891627449824    48.3802857972818    1
42.2617008099817    87.1038509402546    1
99.3150088051039    68.7754094720662    1
55.3400175600370    64.9319380069486    1
74.7758930009277    89.5298128951328    1

我运行了你的代码,它运行得很好。然而,梯度下降法的棘手之处在于确保成本不会偏离到无穷大。如果你看一下你的成本表,你会发现成本肯定是不同的,这就是为什么你没有得到正确的结果

在您的情况下,消除这种情况的最佳方法是降低学习率。通过实验,我发现学习率
alpha=0.003
最适合你的问题。我还将迭代次数增加到
200000
。更改这两件事将为我提供以下参数和相关成本:

>> format long g;
>> thetas

thetas =

         -17.6287417780435
         0.146062780453677
         0.140513170941357

>> cost(end)

ans =

         0.214821863463963
这或多或少与您在使用
fminunc
时看到的参数大小一致。然而,由于实际的最小化方法本身,它们得到的参数和成本略有不同
fminunc
使用了一种变体,它可以更快地找到解决方案

最重要的是实际的准确性本身。请记住,要分类一个示例是属于标签0还是标签1,您需要获取参数和示例的加权和,通过sigmoid函数运行它,并将阈值设置为0.5。我们找到每个预期标签和预测标签匹配的平均次数

使用我们在梯度下降中找到的参数,我们可以获得以下精度:

>> ttrx = x * thetas;
>> h_x = 1 ./ (1 + exp(-ttrx)) >= 0.5;
>> mean(h_x == y)

ans =

                      0.89
这意味着我们已经实现了89%的分类准确率。使用
fminunc
提供的标签还提供:

>> thetas2 = [-24.932760; 0.204406; 0.199616];
>> ttrx = x * thetas2;
>> h_x = 1 ./ (1 + exp(-ttrx)) >= 0.5;
>> mean(h_x == y)

ans =

                      0.89
因此,我们可以看到精度是相同的,所以我不会太担心参数的大小,但它更符合我们在比较两种实现之间的成本时看到的情况


作为最后一点,我建议大家看看我的这篇文章,了解一些如何使逻辑回归长期有效的技巧。我绝对建议在找到参数之前规范化您的特性,以使算法运行得更快。它还说明了为什么您发现了错误的参数(即成本膨胀):。

感谢@rayryeng的澄清。我有点认为我在正确的轨道上,但你证实了这一点。现在我更了解了使用GD时需要了解和处理的各种参数。顺便说一句,你在这里也帮了很多忙。再次谢谢你,兄弟。哦,哇。你的名字确实让人想起了。我记得那篇帖子!不客气。如果你需要更多的帮助,请告诉我,否则我希望你能接受!