Apache spark 在spark中将Seq[(String,Any)]转换为Seq[(String,org.apache.spark.ml.PredictionModel[u,u])

Apache spark 在spark中将Seq[(String,Any)]转换为Seq[(String,org.apache.spark.ml.PredictionModel[u,u]),apache-spark,spark-dataframe,apache-spark-mllib,apache-spark-ml,Apache Spark,Spark Dataframe,Apache Spark Mllib,Apache Spark Ml,我将数据集训练成不同的模型,如nbModel、dtModel、rfModel和GbmModel。所有这些都是机器学习模型 现在,当我将其保存到变量中时 val models = Seq(("NB", nbModel), ("DT", dtModel), ("RF", rfModel), ("GBM",gbmModel)) 我得到一个Seq[(字符串,任意)] 如果是单个模型,如nbModel val models = ("NB", nbModel) 输出:models:(String,or

我将数据集训练成不同的模型,如nbModel、dtModel、rfModel和GbmModel。所有这些都是机器学习模型

现在,当我将其保存到变量中时

val models = Seq(("NB", nbModel), ("DT", dtModel), ("RF", rfModel), ("GBM",gbmModel))
我得到一个Seq[(字符串,任意)]

如果是单个模型,如nbModel

 val models = ("NB", nbModel)
输出:
models:(String,org.apache.spark.ml.classification.NaiveBayesModel)=(NB,NaiveBayesModel(uid=NB_c35f79982850),带有2个类)

当我试图合并这些模型中的几个列时,我得到了类型不匹配错误

val mlTrainData= mlData(transferData, "value", models).drop("row_id")
:75:错误:类型不匹配;
找到:Seq[(字符串,任意)]
必需:Seq[(字符串,org.apache.spark.ml.PredictionModel[,])]
val mlTrainData=mlData(传输数据,“值”,型号)。删除(“行id”)

我的MlDATA也是

def mlData(inputData: DataFrame, responseColumn: String, baseModels:
 | Seq[(String, PredictionModel[_, _])]): DataFrame= {
 | baseModels.map{ case(name, model) =>
 | model.transform(inputData)
 | .select("row_id", model.getPredictionCol )
 | .withColumnRenamed("prediction", s"${name}_prediction")
 | }.reduceLeft((a, b) =>a.join(b, Seq("row_id"), "inner"))
 | .join(inputData.select("row_id", responseColumn), Seq("row_id"),
 | "inner")
 | }
输出:
mlData:(inputData:org.apache.spark.sql.DataFrame,responseColumn:String,baseModels:Seq[(String,org.apache.spark.ml.PredictionModel[u,]))org.apache.spark.sql.DataFrame

您能更换代码吗

val models = Seq(("NB", nbModel), ("DT", dtModel), ("RF", rfModel), ("GBM",gbmModel))

我想说的是,您的dtModel被指定为(),属于单元类型。因此,整个数据集的类型成为DecisionTreeModel和Unit的超类,即Any。您需要确保dtModel的类型是DecisionTreeModel,如果该类型为null,并且您已经处理了null的情况,那么就可以了。一个空的DecisionTreeModel也可以工作

val models = Seq(("NB", nbModel), ("DT", dtModel), ("RF", rfModel), ("GBM",gbmModel))
val models = Seq(("NB", nbModel), ("DT", null : org.apache.spark.mllib.tree.model.DecisionTreeModel), ("RF", rfModel), ("GBM",gbmModel))