Numpy 深度学习4J中的翻译千层面神经网络

Numpy 深度学习4J中的翻译千层面神经网络,numpy,deep-learning,theano,lasagne,deeplearning4j,Numpy,Deep Learning,Theano,Lasagne,Deeplearning4j,我正在将千层面神经网络翻译成deeplearning4j代码。到目前为止,我已经设法使层到位,但我不确定其他配置是否正常。我不是神经网络方面的专家,在deeplearning4j中很难找到等价的函数/方法 这是lasagne python代码: conv_net = NeuralNet( layers=[ ('input', layers.InputLayer), ('conv1a', layers.Conv2DLayer), ('

我正在将千层面神经网络翻译成deeplearning4j代码。到目前为止,我已经设法使层到位,但我不确定其他配置是否正常。我不是神经网络方面的专家,在deeplearning4j中很难找到等价的函数/方法

这是lasagne python代码:

    conv_net = NeuralNet(
    layers=[
        ('input', layers.InputLayer),
        ('conv1a', layers.Conv2DLayer),
        ('conv1', layers.Conv2DLayer),
        ('pool1', layers.MaxPool2DLayer),
        ('dropout1', layers.DropoutLayer),
        ('conv2a', layers.Conv2DLayer),
        ('conv2', layers.Conv2DLayer),
        ('pool2', layers.MaxPool2DLayer),
        ('dropout2', layers.DropoutLayer),
        ('conv3a', layers.Conv2DLayer),
        ('conv3', layers.Conv2DLayer),
        ('pool3', layers.MaxPool2DLayer),
        ('dropout3', layers.DropoutLayer),
        ('hidden4', layers.DenseLayer),
        ('dropout4', layers.DropoutLayer),
        ('hidden5', layers.DenseLayer),
        ('output', layers.DenseLayer),
    ],

    input_shape=(None, NUM_CHANNELS, IMAGE_SIZE, IMAGE_SIZE),
    conv1a_num_filters=16, conv1a_filter_size=(7, 7), conv1a_nonlinearity=leaky_rectify,
    conv1_num_filters=32, conv1_filter_size=(5, 5), conv1_nonlinearity=leaky_rectify, pool1_pool_size=(2, 2), dropout1_p=0.1,
    conv2a_num_filters=64, conv2a_filter_size=(5, 5), conv2a_nonlinearity=leaky_rectify,
    conv2_num_filters=64, conv2_filter_size=(3, 3), conv2_nonlinearity=leaky_rectify, pool2_pool_size=(2, 2), dropout2_p=0.2,
    conv3a_num_filters=256, conv3a_filter_size=(3, 3), conv3a_nonlinearity=leaky_rectify,
    conv3_num_filters=256, conv3_filter_size=(3, 3), conv3_nonlinearity=leaky_rectify, pool3_pool_size=(2, 2), dropout3_p=0.2,
    hidden4_num_units=1250, dropout4_p=0.75, hidden5_num_units=1000,
    output_num_units=y.shape[1], output_nonlinearity=None,

    batch_iterator_train=AugmentBatchIterator(batch_size=180),

    update_learning_rate=theano.shared(np.cast['float32'](0.03)),
    update_momentum=theano.shared(np.cast['float32'](0.9)),

    on_epoch_finished=[
        AdjustVariable('update_learning_rate', start=0.01, stop=0.0001),
        AdjustVariable('update_momentum', start=0.9, stop=0.999),
        StoreBestModel('wb_' + out_file_name)
    ],

    regression=True,
    max_epochs=600,
    train_split=0.1,
    verbose=1,
)

conv_net.batch_iterator_train.part_flips = flip_idxs
conv_net.load_params_from('wb_keypoint_net3.pk')

conv_net.fit(X, y)
以下是我在DeepLearning 4J中获得的信息:

  int batch = 100;
    int iterations = data.getX().size(0) / batch + 1;
    int epochs = 600;
    logger.warn("Building model");
    MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
            .updater(Updater.NESTEROVS).momentum(0.9)
            .activation(Activation.RELU)
            .weightInit(WeightInit.XAVIER)
            .learningRate(0.3)
            .learningRateDecayPolicy(LearningRatePolicy.Score)
            .lrPolicyDecayRate(0.1)
            .regularization(true).l2(1e-4)
            .list()
            .layer(0, new ConvolutionLayer.Builder(7, 7).activation(Activation.LEAKYRELU).nOut(16).build()) //rectified linear units
            .layer(1, new ConvolutionLayer.Builder(5, 5).nOut(32).activation(Activation.LEAKYRELU).build())
            .layer(2, new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX).kernelSize(2, 2).build())
            .layer(3, new DropoutLayer.Builder(0.1).build())
            .layer(4, new ConvolutionLayer.Builder(5, 5).nOut(64).activation(Activation.LEAKYRELU).build())
            .layer(5, new ConvolutionLayer.Builder(3, 3).nOut(64).activation(Activation.LEAKYRELU).build())
            .layer(6, new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX).kernelSize(2, 2).build())
            .layer(7, new DropoutLayer.Builder(0.2).build())
            .layer(8, new ConvolutionLayer.Builder(3, 3).nOut(256).activation(Activation.LEAKYRELU).build())
            .layer(9, new ConvolutionLayer.Builder(3, 3).nOut(256).activation(Activation.LEAKYRELU).build())
            .layer(10, new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX).kernelSize(2, 2).build())
            .layer(11, new DropoutLayer.Builder(0.2).build())
            .layer(12, new DenseLayer.Builder().nOut(1250).build())
            .layer(13, new DropoutLayer.Builder(0.75).build())
            .layer(14, new DenseLayer.Builder().nOut(1000).build())
            .layer(15, new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
                    .nOut(data.getY().size(1)).activation(Activation.SOFTMAX).build())
            .setInputType(InputType.convolutional(image_size, image_size, num_channels))
            .backprop(true).pretrain(false)
            .build();

    MultiLayerNetwork model = new MultiLayerNetwork(conf);
    DataSet dataSet = new DataSet(data.getX(), data.getY());


    MiniBatchFileDataSetIterator iterator1 = new MiniBatchFileDataSetIterator(dataSet, batch);


    model.init();
    logger.warn("Train model");

    model.setListeners(new ScoreIterationListener(iterations));
    UtilSaveLoadMultiLayerNetwork uslmln = new UtilSaveLoadMultiLayerNetwork();
    for (int i = 0; i < epochs; i++) {
        logger.warn("Started epoch " + i);
        model.fit(iterator1);
        uslmln.save(model, filename);
     }
int批=100;
int iterations=data.getX().size(0)/批+1;
整数时代=600;
logger.warn(“建筑模型”);
多层配置conf=new NeuralNetConfiguration.Builder()
.updater(updater.NESTEROVS).momentum(0.9)
.activation(activation.RELU)
.weightInit(weightInit.XAVIER)
.学习率(0.3)
.LearningRateDecyPolicy(LearningRatePolicy.Score)
.癸酸铅(0.1)
.正则化(真)。l2(1e-4)
.list()
.layer(0,新卷积层.Builder(7,7).activation(activation.LEAKYRELU).nOut(16.build())//校正的线性单位
.layer(1,新卷积层.Builder(5,5).nOut(32).activation(activation.LEAKYRELU.build())
.layer(2,新的SubsamplingLayer.Builder(SubsamplingLayer.poolgtype.MAX).kernelSize(2,2.build())
.layer(3,新的DropoutLayer.Builder(0.1).build())
.layer(4,新卷积层.Builder(5,5).nOut(64).activation(activation.LEAKYRELU.build())
.layer(5,新卷积层.Builder(3,3).nOut(64).activation(activation.LEAKYRELU.build())
.layer(6,新的SubsamplingLayer.Builder(SubsamplingLayer.poolgtype.MAX).kernelSize(2,2.build())
.layer(7,新的DropoutLayer.Builder(0.2).build())
.layer(8,新卷积层.Builder(3,3).nOut(256).activation(activation.LEAKYRELU.build())
.layer(9,新卷积层.Builder(3,3).nOut(256).activation(activation.LEAKYRELU.build())
.layer(10,新的SubsamplingLayer.Builder(SubsamplingLayer.poolgtype.MAX).kernelSize(2,2.build())
.layer(11,新的DropoutLayer.Builder(0.2).build())
.layer(12,新的DenseLayer.Builder().nOut(1250.build())
.layer(13,新的DropoutLayer.Builder(0.75).build())
.layer(14,新的DenseLayer.Builder().nOut(1000.build())
.layer(15,新的OutputLayer.Builder(LossFunctions.LossFunction.NegativeLosGlikeliHood)
.nOut(data.getY().size(1)).activation(activation.SOFTMAX.build())
.setInputType(InputType.Convolative(图像大小、图像大小、通道数))
.backprop(正确)。pretrain(错误)
.build();
多层网络模型=新的多层网络(conf);
DataSet数据集=新数据集(data.getX(),data.getY());
MiniBatchFileDataSeterator迭代器1=新的MiniBatchFileDataSeterator(数据集,批次);
model.init();
记录器警告(“列车模型”);
setListeners(新的ScoreIterationListener(迭代));
UtilSaveLoadMultiLayerNetwork uslmln=新UtilSaveLoadMultiLayerNetwork();
for(int i=0;i
我主要感兴趣的是激活功能和配置是否等效。问题是,当我用java运行神经网络时,它似乎一点也不学习,即使在50个时代之后,分数仍然保持在0.2,没有明显的改进,而且我确信有些东西配置错误


谢谢

您的数据管道是否完全相同?这也包括规范化等。使用deeplearning4j,您不需要指定输出的数量。我们为你这样做。另外-您使用的UI服务器错误。我们的示例演示了如何完成这些工作:

我不确定是什么导致你这么做,但你每次都在重新连接存储,这样神经网络的实际统计数据就不会随着时间的推移而持续。您应该将其设置在for循环之上,而不是在。如果您想要像这里尝试做的那样进行模型快照,您可能希望使用早期停止

还有,你是如何达到这种偏差学习率的?这甚至没有出现在你的lasange配置中。这看起来很武断。我建议把它扔掉。 输出层看起来也有问题。您应该使用负对数似然和softmax(再次看看我们的示例,都在那里)
从外观上看,在lasange中也使用了学习速率衰减。Deeplearning4j也支持这一点。我将通过我们的例子来了解如何做到这一点。我们支持几种学习速率衰减策略。您应该可以在javadoc()或ide的自动完成中找到它。

谢谢您的帮助!是的,管道是完全一样的。我已经删除了ui部分,我正在为输出层使用负对数似然和softmax。但是现在第一次迭代的分数是负数,第二次是NaN。我仍然在做错事,我不知道是什么。信息:迭代0的得分为-13.107518335451656信息:迭代21的得分为NaN。如果我删除了输出,我会得到一些错误…数据已经标准化了吗?按照这个步骤,看看你能走多远:是的,它是标准化的。我正在用2000张图片训练网络。数据集中的X是ndarray:2000x128x128x3,Y是ndarray:2000x16 X的值为:0.24、0.17、0.12等,Y的值为-0.55、-0.40、-0.62等。X的值是否太小?不,这是个好迹象。您的学习速度与python脚本不一样。另一件事是学习率衰减。我已经更新了学习率(从0.1到0.3),并尝试了所有learningRateDecayPolicy类型。结果是一样的:第一次迭代是负数,其他的是NaN。我不知道我做错了什么。