在Spark Scala中运行SVD_Scala_Apache Spark_Svd

在Spark Scala中运行SVD

scala apache-spark

在Spark Scala中运行SVD,scala,apache-spark,svd,Scala,Apache Spark,Svd,我有一个RDD，其中有单词和它的向量表示。我举了下面的例子： SingularValueDecomposition类返回行矩阵。它没有最初在RowMatrix中为其生成向量的word。我现在不知道如何使用SingularValueDecomposition输出，因为它只是一个简化的矩阵，没有单词标签有人遇到过类似的问题吗？我可以通过以下步骤来解决： // GET word and vector. val cvModel: CountVectorizerModel = new CountVec

我有一个RDD，其中有单词和它的向量表示。我举了下面的例子：

SingularValueDecomposition类返回行矩阵。它没有最初在RowMatrix中为其生成向量的word。我现在不知道如何使用SingularValueDecomposition输出，因为它只是一个简化的矩阵，没有单词标签

有人遇到过类似的问题吗？

我可以通过以下步骤来解决：

// GET word and vector.
val cvModel: CountVectorizerModel = new  CountVectorizer().setInputCol("filteredWords").setOutputCol("features").setVocabSize(100000).setMinDF(2).fit(newSentenceData)
// Model is fitted
val fittedModel = cvModel.transform(newSentenceData)

// Converted the Dataframe to RDD as the SVD library works on RDD.
val rddVectorWithAllColumns = fittedModel.rdd

// Here, i have truncated the code and assumed that svd variable is holding the model. In this step, i am accessing the U matrix and adding the word back to the RDD so that we can get reduced vectors and word.
val test = svd.U.rows.map(row => row.toArray).zip(rddVectorWithAllColumns.map(row => row.getString(0))).map(line => line._2 + "\t" + line._1.mkString("\t"))