Apache spark 矢量汇编程序的SparkML替代方案_Apache Spark_Apache Spark Sql_Apache Spark Mllib_Apache Spark Ml

Apache spark 矢量汇编程序的SparkML替代方案

apache-spark

Apache spark 矢量汇编程序的SparkML替代方案,apache-spark,apache-spark-sql,apache-spark-mllib,apache-spark-ml,Apache Spark,Apache Spark Sql,Apache Spark Mllib,Apache Spark Ml,我有一个逻辑回归sparkml管道，其中一个阶段是组合单图、双图和三元图。目前，我正在使用向量汇编程序来组合它们。向量汇编程序似乎非常昂贵，并且将我的预测时间增加了三倍。有什么想法吗 val unigram = new NGram().setN(1).setInputCol("words").setOutputCol("unigram") val hashingTFunigram = new HashingTF().setInputCol(unigram.getOutputCol).setOu

我有一个逻辑回归sparkml管道，其中一个阶段是组合单图、双图和三元图。目前，我正在使用向量汇编程序来组合它们。向量汇编程序似乎非常昂贵，并且将我的预测时间增加了三倍。有什么想法吗

val unigram = new NGram().setN(1).setInputCol("words").setOutputCol("unigram")
val hashingTFunigram = new HashingTF().setInputCol(unigram.getOutputCol).setOutputCol("tfFeatures").setNumFeatures(5000)

val bigram = new NGram().setN(2).setInputCol("words").setOutputCol("bigram")
val hashingTFbigram = new HashingTF().setInputCol(bigram.getOutputCol).setOutputCol("tfFeaturesbigram").setNumFeatures(5000)

val trigram = new NGram().setN(3).setInputCol("words").setOutputCol("trigram")
val hashingTFtrigram = new HashingTF().setInputCol(trigram.getOutputCol).setOutputCol("tfFeaturestrigram").setNumFeatures(5000)

val assembler = new VectorAssembler()
  .setInputCols(Array("tfFeaturesunigram", "tfFeaturesbigram", "tfFeaturestrigram"))
  .setOutputCol("tfFeatures")