Mlib RandomForest（Spark 2.0）预测单个向量_Random_Apache Spark_Machine Learning_Random Forest

Mlib RandomForest（Spark 2.0）预测单个向量

random apache-spark machine-learning

Mlib RandomForest（Spark 2.0）预测单个向量,random,apache-spark,machine-learning,random-forest,Random,Apache Spark,Machine Learning,Random Forest,使用mlib和DataFrame（Spark 2.0）在PipelineModel中训练随机森林回归器后我将保存的模型加载到我的RT环境中，以便使用模型预测每个请求通过加载的PipelineModel进行处理和转换，但在此过程中，我必须转换使用spark.createdataframe将单个请求向量发送到一行数据帧所有这一切大约需要700毫秒与使用mllib RDD RandomForestRegregator.predict（VECTOR）时的2.5ms相比。有没有办法使用新的mli

使用mlib和DataFrame（Spark 2.0）在PipelineModel中训练随机森林回归器后我将保存的模型加载到我的RT环境中，以便使用模型预测每个请求通过加载的PipelineModel进行处理和转换，但在此过程中，我必须转换使用spark.createdataframe将单个请求向量发送到一行数据帧所有这一切大约需要700毫秒

与使用mllib RDD RandomForestRegregator.predict（VECTOR）时的2.5ms相比。

有没有办法使用新的mlib来预测单个向量而不转换为数据帧，或者做一些其他事情来加快速度？

基于数据帧的

org.apache.spark.ml.regression.RandomForestRegressionModel

也将

向量作为输入。我不认为每次调用都需要将向量转换为数据帧
以下是我认为您的代码应该如何工作
    //load the trained RF model
    val rfModel = RandomForestRegressionModel.load("path")  
    val predictionData = //a dataframe containing a column 'feature' of type Vector
    predictionData.map { row => 
        Vector feature = row.getAs[Vector]("feature")
        Double result = rfModel.predict(feature)
        (feature, result)
    }