使用scala从索引中获取TF-IDF值_Scala_Tf Idf_Apache Spark Ml

使用scala从索引中获取TF-IDF值

scala

使用scala从索引中获取TF-IDF值,scala,tf-idf,apache-spark-ml,Scala,Tf Idf,Apache Spark Ml,单词和索引没有按顺序排列。例如文档0，to与388相关，但我不知道333与哪个单词相关。如何使用rawFeatures索引获取word。对于CountVectorizer，我可以使用countVectorizerModel.词汇表 import spark.implicits._ val df = spark.sparkContext.parallelize(Array( (0, "to to Scala for better integration with S

单词和索引没有按顺序排列。例如文档0，

to

与

相关，但我不知道

与哪个单词相关。如何使用rawFeatures索引获取word。对于

CountVectorizer

，我可以使用

countVectorizerModel.词汇表
 import spark.implicits._
    val df = spark.sparkContext.parallelize(Array(
      (0, "to to Scala for better integration with Spark, and easier collaboration other".split(" ")),
      (1, "For example in the case when the document is mostly about".split(" ")),
      (2, "you need to to put some import declarations and create some data".split(" "))
    )).toDF("id", "content")

    val hashingTF = new HashingTF().setInputCol("content").setOutputCol("rawFeatures").setNumFeatures(2000)
    val featurizedData = hashingTF.transform(df)
    val idf = new IDF().setInputCol("rawFeatures").setOutputCol("features")
    val idfModel = idf.fit(featurizedData)
    val rescaleData = idfModel.transform(featurizedData)
    rescaleData.show(false)
+---+------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|id |content                                                                                   |rawFeatures                                                                                            |features                                                                                                                                                                                                                                                                     |
+---+------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|0  |[to, to, Scala, for, better, integration, with, Spark,, and, easier, collaboration, other]|(2000,[333,388,460,674,935,941,1036,1474,1534,1650,1988],[1.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0])|(2000,[333,388,460,674,935,941,1036,1474,1534,1650,1988],[0.28768207245178085,0.5753641449035617,0.6931471805599453,0.6931471805599453,0.6931471805599453,0.6931471805599453,0.6931471805599453,0.6931471805599453,0.6931471805599453,0.6931471805599453,0.6931471805599453])|
|1  |[For, example, in, the, case, when, the, document, is, mostly, about]                     |(2000,[342,956,1076,1243,1281,1445,1710,1760,1777,1820],[1.0,1.0,1.0,1.0,1.0,1.0,2.0,1.0,1.0,1.0])     |(2000,[342,956,1076,1243,1281,1445,1710,1760,1777,1820],[0.6931471805599453,0.6931471805599453,0.6931471805599453,0.6931471805599453,0.6931471805599453,0.6931471805599453,1.3862943611198906,0.6931471805599453,0.6931471805599453,0.6931471805599453])                     |
|2  |[you, need, to, to, put, some, import, declarations, and, create, some, data]             |(2000,[265,333,345,388,401,418,537,1400,1425,1695],[1.0,1.0,1.0,2.0,1.0,1.0,1.0,2.0,1.0,1.0])          |(2000,[265,333,345,388,401,418,537,1400,1425,1695],[0.6931471805599453,0.28768207245178085,0.6931471805599453,0.5753641449035617,0.6931471805599453,0.6931471805599453,0.6931471805599453,1.3862943611198906,0.6931471805599453,0.6931471805599453])                         |
+---+------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+