Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在Spark Scala中运行SVD_Scala_Apache Spark_Svd - Fatal编程技术网

在Spark Scala中运行SVD

在Spark Scala中运行SVD,scala,apache-spark,svd,Scala,Apache Spark,Svd,我有一个RDD,其中有单词和它的向量表示。我举了下面的例子: SingularValueDecomposition类返回行矩阵。它没有最初在RowMatrix中为其生成向量的word。我现在不知道如何使用SingularValueDecomposition输出,因为它只是一个简化的矩阵,没有单词标签 有人遇到过类似的问题吗?我可以通过以下步骤来解决: // GET word and vector. val cvModel: CountVectorizerModel = new CountVec

我有一个RDD,其中有单词和它的向量表示。我举了下面的例子:

SingularValueDecomposition类返回行矩阵。它没有最初在RowMatrix中为其生成向量的word。我现在不知道如何使用SingularValueDecomposition输出,因为它只是一个简化的矩阵,没有单词标签


有人遇到过类似的问题吗?

我可以通过以下步骤来解决:

// GET word and vector.
val cvModel: CountVectorizerModel = new  CountVectorizer().setInputCol("filteredWords").setOutputCol("features").setVocabSize(100000).setMinDF(2).fit(newSentenceData)
// Model is fitted
val fittedModel = cvModel.transform(newSentenceData)

// Converted the Dataframe to RDD as the SVD library works on RDD.
val rddVectorWithAllColumns = fittedModel.rdd

// Here, i have truncated the code and assumed that svd variable is holding the model. In this step, i am accessing the U matrix and adding the word back to the RDD so that we can get reduced vectors and word.
val test = svd.U.rows.map(row => row.toArray).zip(rddVectorWithAllColumns.map(row => row.getString(0))).map(line => line._2 + "\t" + line._1.mkString("\t"))