Machine learning mxnet培训没有进展

Machine learning mxnet培训没有进展,machine-learning,neural-network,deep-learning,mxnet,Machine Learning,Neural Network,Deep Learning,Mxnet,提前感谢您的帮助 我在让mxnet模型收敛到任何东西时遇到了一些问题:它似乎停留在接近其初始权重的位置 一个有效的例子(尽管我现在很难让许多这样的模型发挥作用)。我尝试过下面的方法,经历了一系列的时期(最多100个),学习率范围(0.001到10个),但从中无法得到任何合理的结果 import mxnet as mx import numpy as np inputs = np.expand_dims(np.random.uniform(size=10000), axis=1) labels

提前感谢您的帮助

我在让mxnet模型收敛到任何东西时遇到了一些问题:它似乎停留在接近其初始权重的位置

一个有效的例子(尽管我现在很难让许多这样的模型发挥作用)。我尝试过下面的方法,经历了一系列的时期(最多100个),学习率范围(0.001到10个),但从中无法得到任何合理的结果

import mxnet as mx
import numpy as np

inputs = np.expand_dims(np.random.uniform(size=10000), axis=1)
labels = np.sin(inputs)

data_iter = mx.io.NDArrayIter(data=inputs, label=labels, data_name='data', label_name='label', batch_size=50)

data = mx.sym.Variable('data')
label = mx.sym.Variable('label')

fc1 = mx.sym.FullyConnected(data=data, num_hidden=128)
ac1 = mx.sym.Activation(data=fc1, act_type='relu')

fc2 = mx.sym.FullyConnected(data=ac1, num_hidden=64)
ac2 = mx.sym.Activation(data=fc2, act_type='relu')

fc3 = mx.sym.FullyConnected(data=ac2, num_hidden=16)
ac3 = mx.sym.Activation(data=fc3, act_type='relu')

output = mx.sym.FullyConnected(data=ac3, num_hidden=1)
loss = mx.symbol.MakeLoss(mx.symbol.square(output - label), name="loss")

model = mx.module.Module(symbol=loss, data_names=('data',), label_names=('label',))

import logging
logging.getLogger().setLevel(logging.DEBUG)
model.fit(data_iter,
          optimizer='sgd',
          optimizer_params={'learning_rate':0.1},
          eval_metric='mse',
          num_epoch=5)
导致:

INFO:root:Epoch[0] Train-mse=0.221155
INFO:root:Epoch[0] Time cost=0.173
INFO:root:Epoch[1] Train-mse=0.225179
INFO:root:Epoch[1] Time cost=0.176
INFO:root:Epoch[2] Train-mse=0.225179
INFO:root:Epoch[2] Time cost=0.179
INFO:root:Epoch[3] Train-mse=0.225179
INFO:root:Epoch[3] Time cost=0.176
INFO:root:Epoch[4] Train-mse=0.225179
INFO:root:Epoch[4] Time cost=0.183

很明显,培训没有真正进展。

我将您的代码更新了一点,并且能够使其收敛,代码粘贴在下面

我做了更新:我更新了层,使其只有两个完全连接的层,每个层有128个单元,更新了损失函数以使用内置的线性回归,增加了动量并更新了学习率,最后运行了更多的纪元

希望这有帮助

import mxnet as mx
import numpy as np

inputs = np.expand_dims(np.random.uniform(size=10000), axis=1)
labels = np.sin(inputs)

data_iter = mx.io.NDArrayIter(data=inputs, label=labels, data_name='data', label_name='label', batch_size=50)

data = mx.sym.Variable('data')
label = mx.sym.Variable('label')

fc1 = mx.sym.FullyConnected(data=data, num_hidden=128)
ac1 = mx.sym.Activation(data=fc1, act_type='relu')

fc2 = mx.sym.FullyConnected(data=ac1, num_hidden=128)
ac2 = mx.sym.Activation(data=fc2, act_type='relu')

output = mx.sym.FullyConnected(data=ac2, num_hidden=1)
#loss = mx.symbol.MakeLoss(mx.symbol.square(output - label), name="loss")
loss = mx.sym.LinearRegressionOutput(data=output, label=label, name="loss")

model = mx.module.Module(symbol=loss, data_names=('data',), label_names=('label',))

import logging
logging.getLogger().setLevel(logging.DEBUG)
model.fit(data_iter,
          optimizer='sgd',
          optimizer_params={'learning_rate':0.005, 'momentum': 0.9},
          eval_metric='mse',
          num_epoch=50)
结果:

INFO:root:Epoch[0] Train-mse=0.076923
INFO:root:Epoch[0] Time cost=0.148
INFO:root:Epoch[1] Train-mse=0.061155
INFO:root:Epoch[1] Time cost=0.178
INFO:root:Epoch[2] Train-mse=0.061154
INFO:root:Epoch[2] Time cost=0.168
INFO:root:Epoch[3] Train-mse=0.061153
INFO:root:Epoch[3] Time cost=0.151
INFO:root:Epoch[4] Train-mse=0.061151
INFO:root:Epoch[4] Time cost=0.182
INFO:root:Epoch[5] Train-mse=0.061150
INFO:root:Epoch[5] Time cost=0.186
INFO:root:Epoch[6] Train-mse=0.061149
INFO:root:Epoch[6] Time cost=0.197
INFO:root:Epoch[7] Train-mse=0.061147
INFO:root:Epoch[7] Time cost=0.174
INFO:root:Epoch[8] Train-mse=0.061145
INFO:root:Epoch[8] Time cost=0.148
INFO:root:Epoch[9] Train-mse=0.061142
INFO:root:Epoch[9] Time cost=0.150
INFO:root:Epoch[10] Train-mse=0.061140
INFO:root:Epoch[10] Time cost=0.145
INFO:root:Epoch[11] Train-mse=0.061136
INFO:root:Epoch[11] Time cost=0.135
INFO:root:Epoch[12] Train-mse=0.061133
INFO:root:Epoch[12] Time cost=0.136
INFO:root:Epoch[13] Train-mse=0.061128
INFO:root:Epoch[13] Time cost=0.137
INFO:root:Epoch[14] Train-mse=0.061122
INFO:root:Epoch[14] Time cost=0.146
INFO:root:Epoch[15] Train-mse=0.061116
INFO:root:Epoch[15] Time cost=0.135
INFO:root:Epoch[16] Train-mse=0.061108
INFO:root:Epoch[16] Time cost=0.152
INFO:root:Epoch[17] Train-mse=0.061098
INFO:root:Epoch[17] Time cost=0.179
INFO:root:Epoch[18] Train-mse=0.061086
INFO:root:Epoch[18] Time cost=0.160
INFO:root:Epoch[19] Train-mse=0.061069
INFO:root:Epoch[19] Time cost=0.151
INFO:root:Epoch[20] Train-mse=0.061050
INFO:root:Epoch[20] Time cost=0.145
INFO:root:Epoch[21] Train-mse=0.061024
INFO:root:Epoch[21] Time cost=0.164
INFO:root:Epoch[22] Train-mse=0.060990
INFO:root:Epoch[22] Time cost=0.151
INFO:root:Epoch[23] Train-mse=0.060944
INFO:root:Epoch[23] Time cost=0.141
INFO:root:Epoch[24] Train-mse=0.060881
INFO:root:Epoch[24] Time cost=0.136
INFO:root:Epoch[25] Train-mse=0.060790
INFO:root:Epoch[25] Time cost=0.124
INFO:root:Epoch[26] Train-mse=0.060658
INFO:root:Epoch[26] Time cost=0.151
INFO:root:Epoch[27] Train-mse=0.060455
INFO:root:Epoch[27] Time cost=0.166
INFO:root:Epoch[28] Train-mse=0.060131
INFO:root:Epoch[28] Time cost=0.148
INFO:root:Epoch[29] Train-mse=0.059582
INFO:root:Epoch[29] Time cost=0.219
INFO:root:Epoch[30] Train-mse=0.058581
INFO:root:Epoch[30] Time cost=0.160
INFO:root:Epoch[31] Train-mse=0.056593
INFO:root:Epoch[31] Time cost=0.178
INFO:root:Epoch[32] Train-mse=0.052252
INFO:root:Epoch[32] Time cost=0.184
INFO:root:Epoch[33] Train-mse=0.042274
INFO:root:Epoch[33] Time cost=0.168
INFO:root:Epoch[34] Train-mse=0.023321
INFO:root:Epoch[34] Time cost=0.162
INFO:root:Epoch[35] Train-mse=0.005860
INFO:root:Epoch[35] Time cost=0.161
INFO:root:Epoch[36] Train-mse=0.000848
INFO:root:Epoch[36] Time cost=0.164
INFO:root:Epoch[37] Train-mse=0.000319
INFO:root:Epoch[37] Time cost=0.176
INFO:root:Epoch[38] Train-mse=0.000221
INFO:root:Epoch[38] Time cost=0.148
INFO:root:Epoch[39] Train-mse=0.000163
INFO:root:Epoch[39] Time cost=0.199
INFO:root:Epoch[40] Train-mse=0.000123
INFO:root:Epoch[40] Time cost=0.141
INFO:root:Epoch[41] Train-mse=0.000096
INFO:root:Epoch[41] Time cost=0.133
INFO:root:Epoch[42] Train-mse=0.000078
INFO:root:Epoch[42] Time cost=0.144
INFO:root:Epoch[43] Train-mse=0.000065
INFO:root:Epoch[43] Time cost=0.174
INFO:root:Epoch[44] Train-mse=0.000056
INFO:root:Epoch[44] Time cost=0.208
INFO:root:Epoch[45] Train-mse=0.000050
INFO:root:Epoch[45] Time cost=0.152
INFO:root:Epoch[46] Train-mse=0.000045
INFO:root:Epoch[46] Time cost=0.154
INFO:root:Epoch[47] Train-mse=0.000041
INFO:root:Epoch[47] Time cost=0.151
INFO:root:Epoch[48] Train-mse=0.000039
INFO:root:Epoch[48] Time cost=0.177
INFO:root:Epoch[49] Train-mse=0.000036
INFO:root:Epoch[49] Time cost=0.135

我建议你去做这个。 检查这个

model.fit(data_iter,
          optimizer='sgd',
          initializer=mx.init.Xavier(),//here it is ,also you may try another initializations
          optimizer_params={'learning_rate':0.005, 'momentum': 0.9},
          eval_metric='mse',
          num_epoch=50)
似乎没有初始化,您将从接近零的权重和偏差均匀分布开始。 在这种情况下,权重变化很小,可能会消失,或者层间差异很小,这可能导致线性模型,而不是接受数据的非线性。 另请参阅这些条款


您应该尝试在输出层激活tanh,这样sin的范围和网络的输出就匹配了。这一点很好。这是一个快速的例子,我为它做了准备,但我的问题甚至适用于更合理的输出层:)我不正确地使用了mxnet,但我看不出在哪里!事实上,代码本身总体上是好的,但我需要对优化器设置和丢失函数更加小心。谢谢你的帮助:)