Java 重新加载的Spark模型似乎不起作用_Java_Apache Spark_Spark Dataframe_Apache Spark Mllib

Java 重新加载的Spark模型似乎不起作用

java apache-spark

Java 重新加载的Spark模型似乎不起作用,java,apache-spark,spark-dataframe,apache-spark-mllib,Java,Apache Spark,Spark Dataframe,Apache Spark Mllib,我正在培训并保存CSV文件中的模型。第一步一切顺利。保存模型后，我尝试加载并使用保存的模型和新数据，但它不起作用。有什么问题培训Java文件 SparkConf sconf = new SparkConf().setMaster("local[*]").setAppName("Test").set("spark.sql.warehouse.dir","D:/Temp/wh"); SparkSession spark = SparkSession.builder().a

我正在培训并保存CSV文件中的模型。第一步一切顺利。保存模型后，我尝试加载并使用保存的模型和新数据，但它不起作用。

有什么问题

培训Java文件

SparkConf sconf = new SparkConf().setMaster("local[*]").setAppName("Test").set("spark.sql.warehouse.dir","D:/Temp/wh");
          SparkSession spark = SparkSession.builder().appName("Java Spark").config(sconf).getOrCreate();
          JavaRDD<Cobj> cRDD = spark.read().textFile("file:///C:/Temp/classifications1.csv").javaRDD()
                       .map(new Function<String, Cobj>() {
                              @Override
                              public Cobj call(String line) throws Exception {
                                     String[] parts = line.split(",");
                                     Cobj c = new Cobj();
                                     c.setClassName(parts[1].trim());
                                     c.setProductName(parts[0].trim());                                   
                                     return c;
                              }
                       });

          Dataset<Row> mainDataset = spark.createDataFrame(cRDD, Cobj.class);                         

          //StringIndexer
          StringIndexer classIndexer = new StringIndexer()
                        .setHandleInvalid("skip")
                        .setInputCol("className")
                        .setOutputCol("label");
          StringIndexerModel classIndexerModel=classIndexer.fit(mainDataset);

          //Tokenizer
          Tokenizer tokenizer = new Tokenizer()                                
                       .setInputCol("productName")                     
                       .setOutputCol("words");              

          //HashingTF
          HashingTF hashingTF = new HashingTF()
                  .setInputCol(tokenizer.getOutputCol())
                  .setOutputCol("features");

          DecisionTreeClassifier  decisionClassifier = new DecisionTreeClassifier ()                      
                  .setLabelCol("label")
                  .setFeaturesCol("features");

          Pipeline pipeline = new Pipeline()
                  .setStages(new PipelineStage[] {classIndexer,tokenizer,hashingTF,decisionClassifier});

       Dataset<Row>[] splits = mainDataset.randomSplit(new double[]{0.8, 0.2});
       Dataset<Row> train = splits[0];
       Dataset<Row> test = splits[1];

       PipelineModel pipelineModel = pipeline.fit(train);

       Dataset<Row> result = pipelineModel.transform(test);           
       pipelineModel.write().overwrite().save(savePath+"DecisionTreeClassificationModel");

       IndexToString labelConverter = new IndexToString()
                   .setInputCol("prediction")
                   .setOutputCol("PredictedClassName")                       
                   .setLabels(classIndexerModel.labels());
       result=labelConverter.transform(result);
       result.show(num,false);
       Dataset<Row> predictionAndLabels = result.select("prediction", "label");
       MulticlassClassificationEvaluator evaluator = new MulticlassClassificationEvaluator()
         .setMetricName("accuracy");
      System.out.println("Accuracy = " + evaluator.evaluate(predictionAndLabels));

我从管道中删除了StringIndexer，并保存为“StringIndexer”。

在第二个文件中；加载管道后，我加载了StringIndexer以将其转换为预测标签。

我从管道中删除了StringIndexer并保存为“StringIndexer”。在第二个文件中；加载管道后，我加载了StringIndexer以将其转换为预测标签