Neural network 回归神经网络上的最大函数_Neural Network_Deep Linking_Deeplearning4j

Neural network 回归神经网络上的最大函数

neural-network

Neural network 回归神经网络上的最大函数,neural-network,deep-linking,deeplearning4j,Neural Network,Deep Linking,Deeplearning4j,我在训练自己学习神经网络。有一个函数我无法让我的神经网络学习：f（x）=max（x_1，x_2）。这似乎是一个非常简单的函数，有2个输入和1个输入，但一个经过2000个历次的1000多个样本训练的3层神经网络却完全错了。我正在使用deeplearning4j 对于神经网络来说，学习max函数非常困难，或者我只是把它调错了，这有什么原因吗？如果你将x1和x2限制在一个区间内，例如在[0,3]之间，至少没有那么困难。以deeplearning4j示例中的“RegressionSum”示例为例，我很快

我在训练自己学习神经网络。有一个函数我无法让我的神经网络学习：

f（x）=max（x_1，x_2）

。这似乎是一个非常简单的函数，有2个输入和1个输入，但一个经过2000个历次的1000多个样本训练的3层神经网络却完全错了。我正在使用

deeplearning4j

对于神经网络来说，学习max函数非常困难，或者我只是把它调错了，这有什么原因吗？

如果你将x1和x2限制在一个区间内，例如在[0,3]之间，至少没有那么困难。以deeplearning4j示例中的“RegressionSum”示例为例，我很快重写了它，以学习max而不是sum，它的工作非常正常，结果如下：

Max(0.6815540048808918,0.3112081053899819) = 0.64
Max(2.0073597506364407,1.93796211086664) = 2.09
Max(1.1792029272560556,2.5514324329058233) = 2.58
Max(2.489185375059013,0.0818746888836388) = 2.46
Max(2.658169689797984,1.419135581889197) = 2.66
Max(2.855509810112818,2.9661811672685086) = 2.98
Max(2.774757710538552,1.3988513143140069) = 2.79
Max(1.5852295273047565,1.1228662895771744) = 1.56
Max(0.8403435207065576,2.5595015474951195) = 2.60
Max(0.06913178775631723,2.61883825802004) = 2.54

下面是我对RegressionSum示例的修改版本，它最初来自Anwar 3/15/16：

public class RegressionMax {
    //Random number generator seed, for reproducability
    public static final int seed = 12345;
    //Number of iterations per minibatch
    public static final int iterations = 1;
    //Number of epochs (full passes of the data)
    public static final int nEpochs = 200;
    //Number of data points
    public static final int nSamples = 10000;
    //Batch size: i.e., each epoch has nSamples/batchSize parameter updates
    public static final int batchSize = 100;
    //Network learning rate
    public static final double learningRate = 0.01;
    // The range of the sample data, data in range (0-1 is sensitive for NN, you can try other ranges and see how it effects the results
    // also try changing the range along with changing the activation function
    public static int MIN_RANGE = 0;
    public static int MAX_RANGE = 3;

    public static final Random rng = new Random(seed);

    public static void main(String[] args){

        //Generate the training data
        DataSetIterator iterator = getTrainingData(batchSize,rng);

        //Create the network
        int numInput = 2;
        int numOutputs = 1;
        int nHidden = 10;
        MultiLayerNetwork net = new MultiLayerNetwork(new NeuralNetConfiguration.Builder()
                .seed(seed)
                .iterations(iterations)
                .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
                .learningRate(learningRate)
                .weightInit(WeightInit.XAVIER)
                .updater(Updater.NESTEROVS).momentum(0.9)
                .list()
                .layer(0, new DenseLayer.Builder().nIn(numInput).nOut(nHidden)
                        .activation("tanh")
                        .build())
                .layer(1, new OutputLayer.Builder(LossFunctions.LossFunction.MSE)
                        .activation("identity")
                        .nIn(nHidden).nOut(numOutputs).build())
                .pretrain(false).backprop(true).build()
        );
        net.init();
        net.setListeners(new ScoreIterationListener(1));


        //Train the network on the full data set, and evaluate in periodically
        for( int i=0; i<nEpochs; i++ ){
            iterator.reset();
            net.fit(iterator);
        }

        // Test the max of some numbers (Try different numbers here)
        Random rand = new Random();
        for (int i= 0; i< 10; i++) {
            double d1 = MIN_RANGE + (MAX_RANGE - MIN_RANGE) * rand.nextDouble();
            double d2 =  MIN_RANGE + (MAX_RANGE - MIN_RANGE) * rand.nextDouble();
            INDArray input = Nd4j.create(new double[] { d1, d2 }, new int[] { 1, 2 });
            INDArray out = net.output(input, false);
            System.out.println("Max(" + d1 + "," + d2 + ") = " + out);
        }

    }

    private static DataSetIterator getTrainingData(int batchSize, Random rand){
        double [] max = new double[nSamples];
        double [] input1 = new double[nSamples];
        double [] input2 = new double[nSamples];
        for (int i= 0; i< nSamples; i++) {
            input1[i] = MIN_RANGE + (MAX_RANGE - MIN_RANGE) * rand.nextDouble();
            input2[i] =  MIN_RANGE + (MAX_RANGE - MIN_RANGE) * rand.nextDouble();
            max[i] = Math.max(input1[i], input2[i]);
        }
        INDArray inputNDArray1 = Nd4j.create(input1, new int[]{nSamples,1});
        INDArray inputNDArray2 = Nd4j.create(input2, new int[]{nSamples,1});
        INDArray inputNDArray = Nd4j.hstack(inputNDArray1,inputNDArray2);
        INDArray outPut = Nd4j.create(max, new int[]{nSamples, 1});
        DataSet dataSet = new DataSet(inputNDArray, outPut);
        List<DataSet> listDs = dataSet.asList();
        Collections.shuffle(listDs,rng);
        return new ListDataSetIterator(listDs,batchSize);

    }
}

公共类回归max{
//随机数发生器种子，用于复制
公共静态最终整数种子=12345；
//每个小批量的迭代次数
公共静态最终整数迭代=1；
//历元数（数据的完整传递）
公共静态最终int nEpochs=200；
//数据点数量
公共静态最终int nSamples=10000；
//批量大小：即每个历元都有nSamples/BATCHTSIZE参数更新
公共静态最终int batchSize=100；
//网络学习率
公共静态最终双学习率=0.01；
//样本数据的范围，范围（0-1）中的数据对NN很敏感，您可以尝试其他范围，看看它如何影响结果
//还可以尝试在更改激活功能的同时更改范围
公共静态整数最小值范围=0；
公共静态int MAX_RANGE=3；
公共静态最终随机rng=新随机（种子）；
公共静态void main（字符串[]args）{
//生成培训数据
DataSetIterator迭代器=getTrainingData（batchSize，rng）；
//创建网络
int numInput=2；
int numOutputs=1；
int-nHidden=10；
多层网络=新的多层网络（new NeuralNetConfiguration.Builder（）
.种子
.迭代（迭代）
.优化算法（优化算法.随机梯度下降）
.学习率（learningRate）
.weightInit（weightInit.XAVIER）
.updater（updater.NESTEROVS）.momentum（0.9）
.list（）
.layer（0，新的DenseLayer.Builder（）.nIn（numInput）.nOut（nHidden）
.激活（“tanh”）
.build（））
.layer（1，新的OutputLayer.Builder（LossFunctions.LossFunction.MSE）
.激活（“身份”）
.nIn（nHidden）.nOut（numOutputs.build（））
.pretrain（false）.backprop（true）.build（）
);
net.init（）；
net.setListeners（新的ScoreIterationListener（1））；
//根据完整的数据集对网络进行培训，并定期进行评估
对于（int i=0；i我只想指出：如果你使用relu
而不是tanh
，那么实际上有一个精确的解决方案，我猜如果你将网络缩小到这个完全相同的大小（1个隐藏层，3个节点），你总是会得到这些权重（节点的模块排列和权重的缩放）（第一层按伽马缩放，第二层按1/伽马缩放）：
其中，*
是矩阵乘法
此方程式将以下人类可读版本转换为NN语言：
max(a,b) = relu(a-b) + b = relu(a-b) + relu(b) - relu(-b)

我并没有实际测试它，我的观点是，理论上网络学习这个函数应该非常容易
编辑：
我刚刚测试了这个，结果和我预期的一样：
[[-1.0714666e+00 -7.9943770e-01  9.0549403e-01]
 [ 1.0714666e+00 -7.7552663e-08  2.6146751e-08]]

及
其中对应的第一层和第二层。将第二层转置并与第一组权重相乘，最终得到一个归一化版本，可以很容易地与我的理论结果进行比较：
[[-9.9999988e-01  9.9999988e-01  1.0000000e+00]
 [ 9.9999988e-01  9.7009000e-08  2.8875675e-08]]

[[ 0.93330014]
 [-1.250879  ]
 [ 1.1043695 ]]

[[-9.9999988e-01  9.9999988e-01  1.0000000e+00]
 [ 9.9999988e-01  9.7009000e-08  2.8875675e-08]]