Java ApacheIgnite更新以前训练过的ML模型_Java_Machine Learning_Ignite

Java ApacheIgnite更新以前训练过的ML模型

java machine-learning ignite

Java ApacheIgnite更新以前训练过的ML模型,java,machine-learning,ignite,Java,Machine Learning,Ignite,我有一个用于训练KNN模型的数据集。稍后，我想用新的训练数据更新模型。我看到的是，更新后的模型只接受新的训练数据，而忽略了以前训练过的数据 Vectorizer vec = new DummyVectorizer<Integer>(1, 2).labeled(0); DatasetTrainer<KNNClassificationModel, Doub

我有一个用于训练KNN模型的数据集。稍后，我想用新的训练数据更新模型。我看到的是，更新后的模型只接受新的训练数据，而忽略了以前训练过的数据

        Vectorizer                                     vec             = new DummyVectorizer<Integer>(1, 2).labeled(0);
        DatasetTrainer<KNNClassificationModel, Double> trainer         = new KNNClassificationTrainer();
        KNNClassificationModel                         model;
        KNNClassificationModel                         modelUpdated;
        Map<Integer, Vector>                           trainingData    = new HashMap<Integer, Vector>();
        Map<Integer, Vector>                           trainingDataNew = new HashMap<Integer, Vector>();

        Double[][] data1 = new Double[][] {
            {0.136,0.644,0.154},
            {0.302,0.634,0.779},
            {0.806,0.254,0.211},
            {0.241,0.951,0.744},
            {0.542,0.893,0.612},
            {0.334,0.277,0.486},
            {0.616,0.259,0.121},
            {0.738,0.585,0.017},
            {0.124,0.567,0.358},
            {0.934,0.346,0.863}};

        Double[][] data2 = new Double[][] {
            {0.300,0.236,0.193}};
            
        Double[] observationData = new Double[] { 0.8, 0.7 };
            
        // fill dataset (in cache)
        for (int i = 0; i < data1.length; i++)
            trainingData.put(i, new DenseVector(data1[i]));

        // first training / prediction
        model = trainer.fit(trainingData, 1, vec);
        System.out.println("First prediction : " + model.predict(new DenseVector(observationData)));

        // new training data
        for (int i = 0; i < data2.length; i++)
            trainingDataNew.put(data1.length + i, new DenseVector(data2[i]));

        // second training / prediction
        modelUpdated = trainer.update(model, trainingDataNew, 1, vec);
        System.out.println("Second prediction: " + modelUpdated.predict(new DenseVector(observationData)));

这看起来像是第二次预测只使用了数据2，这必须导致0.3作为预测

模型更新是如何工作的？如果我必须将数据2添加到数据1中，然后再次使用数据1进行训练，那么与所有组合数据的全新训练相比，会有什么不同？

模型更新是如何工作的？
具体而言，KNN：将data2添加到data1，并对组合数据调用modelUpdate

以该测试为例：

按照该测试中的说明进行操作：设置您的培训师：

   KNNClassificationTrainer trainer = new KNNClassificationTrainer()
            .withK(3)
            .withDistanceMeasure(new EuclideanDistance())
            .withWeighted(false);

然后设置矢量器：（注意标签坐标是如何创建的）

model=trainer.fit(
培训数据，
部分，
新的DoubleArrayVectorizer（）。标记为（Vectorizer.LabelCoordinate.LAST）
);

然后根据需要调用updateModel

        KNNClassificationModel updatedOnData = trainer.update(
            originalMdlOnEmptyDataset,
            newData,
            parts,
            new DoubleArrayVectorizer<Integer>().labeled(Vectorizer.LabelCoordinate.LAST)
        );

KNClassificationModel UpdateData=trainer.update(
原始数据集，
新数据，
部分，
新的DoubleArrayVectorizer（）。标记为（Vectorizer.LabelCoordinate.LAST）
);

KNN分类文件：

KNN分类示例：

您看过我的代码了吗？我做的和github上的测试差不多。至少在我看来是这样。实际上，甚至行为都是相同的，因为测试首先使用空HashMap进行训练，这导致了与我看到的相同的行为，即只使用第二个训练数据。如果我需要将数据2添加到数据1中，然后调用modelUpdate，这有什么意义，因为我基本上再次对所有数据进行了完整的培训。还是我遗漏了什么？我同意，updateModel（对于knn）需要修复。检查实现：如您所见，mdl参数未被使用，LocalDataSet仅使用新数据创建。我用几个例子验证了它。我将向该项目汇报这一情况。

        model  = trainer.fit(
                trainingData,
                parts,
                new DoubleArrayVectorizer<Integer>().labeled(Vectorizer.LabelCoordinate.LAST)
        );

        KNNClassificationModel updatedOnData = trainer.update(
            originalMdlOnEmptyDataset,
            newData,
            parts,
            new DoubleArrayVectorizer<Integer>().labeled(Vectorizer.LabelCoordinate.LAST)
        );