Java Seq2Seq模型（DL4J）做出荒谬的预测_Java_Machine Learning_Seq2seq_Dl4j_Computation Graph

Java Seq2Seq模型（DL4J）做出荒谬的预测

java machine-learning

Java Seq2Seq模型（DL4J）做出荒谬的预测,java,machine-learning,seq2seq,dl4j,computation-graph,Java,Machine Learning,Seq2seq,Dl4j,Computation Graph,我试图在DL4J中实现一个Seq2Seq预测模型。我最终想要的是使用INPUT\u SIZE数据点的时间序列来预测使用这种模型的OUTPUT\u SIZE数据点的以下时间序列。每个数据点都有numFeatures特征。现在，DL4J有一些示例代码解释如何实现非常基本的Seq2Seq模型。我已经取得了一些进展，将他们的榜样推广到我自己的需要；下面的模型进行了编译，但它所做的预测是毫无意义的 ComputationGraphConfiguration configuration = new Neu

我试图在DL4J中实现一个Seq2Seq预测模型。我最终想要的是使用

INPUT\u SIZE

数据点的时间序列来预测使用这种模型的

OUTPUT\u SIZE

数据点的以下时间序列。每个数据点都有

numFeatures

特征。现在，DL4J有一些示例代码解释如何实现非常基本的Seq2Seq模型。我已经取得了一些进展，将他们的榜样推广到我自己的需要；下面的模型进行了编译，但它所做的预测是毫无意义的

ComputationGraphConfiguration configuration = new 
NeuralNetConfiguration.Builder()
    .weightInit(WeightInit.XAVIER)
    .updater(new Adam(0.25))
    .seed(42)
    .graphBuilder()
    .addInputs("in_data", "last_in")
    .setInputTypes(InputType.recurrent(numFeatures), InputType.recurrent(numFeatures))
    //The inputs to the encoder will have size = minibatch x featuresize x timesteps
    //Note that the network only knows of the feature vector size. It does not know how many time steps unless it sees an instance of the data
    .addLayer("encoder", new LSTM.Builder().nIn(numFeatures).nOut(hiddenLayerWidth).activation(Activation.LEAKYRELU).build(), "in_data")
    //Create a vertex indicating the very last time step of the encoder layer needs to be directed to other places in the comp graph
    .addVertex("lastTimeStep", new LastTimeStepVertex("in_data"), "encoder")
    //Create a vertex that allows the duplication of 2d input to a 3d input
    //In this case the last time step of the encoder layer (viz. 2d) is duplicated to the length of the timeseries "sumOut" which is an input to the comp graph
    //Refer to the javadoc for more detail
    .addVertex("duplicateTimeStep", new DuplicateToTimeSeriesVertex("last_in"), "lastTimeStep")
    //The inputs to the decoder will have size = size of output of last timestep of encoder (numHiddenNodes) + size of the other input to the comp graph,sumOut (feature vector size)
    .addLayer("decoder", new LSTM.Builder().nIn(numFeatures + hiddenLayerWidth).nOut(hiddenLayerWidth).activation(Activation.LEAKYRELU).build(), "last_in","duplicateTimeStep")
    .addLayer("output", new RnnOutputLayer.Builder().nIn(hiddenLayerWidth).nOut(numFeatures).activation(Activation.LEAKYRELU).lossFunction(LossFunctions.LossFunction.MSE).build(), "decoder")
    .setOutputs("output")
    .build();

ComputationGraph net = new ComputationGraph(configuration);
net.init();
net.setListeners(new ScoreIterationListener(1));

我构造输入/标记数据的方式是，我将输入数据分为第一个

input\u SIZE-1

时间序列观察值（对应于计算图中的

in\u data

input）和最后一个时间序列观察值（对应于

lastIn

input）。标签是未来的单个时间步骤；要进行预测，我只需调用

net.output（）

output\u SIZE

次，就可以得到我想要的所有预测。为了更好地了解这一点，以下是我初始化输入/标签的方式：

INDArray[] input = new INDArray[] {Nd4j.zeros(batchSize, numFeatures, INPUT_SIZE - 1), Nd4j.zeros(batchSize, numFeatures, 1)};
INDArray[] labels = new INDArray[] {Nd4j.zeros(batchSize, numFeatures, 1)};

我相信我的错误来自于我的计算图的架构中的错误，而不是我如何准备数据/做出预测/其他事情，因为我已经用更简单的架构完成了其他小型项目，并且没有任何问题

我的数据被标准化，平均值为0，标准偏差为1。因此，大多数条目应该在0左右，然而，我得到的大多数预测值都是绝对值远大于零的值（大约为10s-100s）。这显然是不正确的。我已经为此工作了一段时间，但一直无法找到问题所在；任何关于如何解决这一问题的建议都将不胜感激

我使用的其他资源：可以从第88行开始找到示例Seq2Seq模型。

可以找到计算图文档；我已经详细阅读了这篇文章，看看是否能找到一个没有用的错误。

您确定错误了吗？