Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/19.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/ruby-on-rails-4/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
将org.apache.spark.mllib.linalg.Matrix转换为Scala中的spark数据帧_Scala_Apache Spark_Matrix_Apache Spark Sql_Apache Spark Mllib - Fatal编程技术网

将org.apache.spark.mllib.linalg.Matrix转换为Scala中的spark数据帧

将org.apache.spark.mllib.linalg.Matrix转换为Scala中的spark数据帧,scala,apache-spark,matrix,apache-spark-sql,apache-spark-mllib,Scala,Apache Spark,Matrix,Apache Spark Sql,Apache Spark Mllib,我有一个输入数据框input_dfas: +---------------+--------------------+ |Main_CustomerID| Vector| +---------------+--------------------+ | 725153|[3.0,2.0,6.0,0.0,9.0| | 873008|[4.0,1.0,0.0,1.0,...| | 625109|[1.0,0.0,6.0,1.0,

我有一个输入数据框
input_df
as:

+---------------+--------------------+
|Main_CustomerID|              Vector|
+---------------+--------------------+
|         725153|[3.0,2.0,6.0,0.0,9.0|
|         873008|[4.0,1.0,0.0,1.0,...|
|         625109|[1.0,0.0,6.0,1.0,...|
|         817171|[0.0,4.0,0.0,7.0,...|
|         611498|[1.0,0.0,4.0,5.0,...|
+---------------+--------------------+
input_df
属于模式类型

root
 |-- Main_CustomerID: integer (nullable = true)
 |-- Vector: vector (nullable = true)
通过引用,我创建了索引行矩阵,然后执行以下操作:

val lm = irm.toIndexedRowMatrix.toBlockMatrix.toLocalMatrix 
查找列之间的余弦相似性。现在我有一个结果
mllib
矩阵

cosineSimilarity: org.apache.spark.mllib.linalg.Matrix =
0.0  0.4199605255658081  0.5744269579035528  0.22075539284417395  0.561434614044346
0.0  0.0                 0.2791452631195413  0.7259079527665503   0.6206918387272496
0.0  0.0                 0.0                 0.31792539222893695  0.6997167152675132
0.0  0.0                 0.0                 0.0                  0.6776404124278828
0.0  0.0                 0.0                 0.0                  0.0
现在,我需要将这个
lm
类型的
org.apache.spark.mllib.linalg.Matrix
转换为数据帧。我希望我的输出
dataframe
如下所示:

+---+------------------+------------------+-------------------+------------------+
| _1|                _2|                _3|                 _4|                _5|
+---+------------------+------------------+-------------------+------------------+
|0.0|0.4199605255658081|0.5744269579035528|0.22075539284417395| 0.561434614044346|
|0.0|               0.0|0.2791452631195413| 0.7259079527665503|0.6206918387272496|
|0.0|               0.0|               0.0|0.31792539222893695|0.6997167152675132|
|0.0|               0.0|               0.0|                0.0|0.6776404124278828|
|0.0|               0.0|               0.0|                0.0|               0.0|
+---+------------------+------------------+-------------------+------------------+

如何在Scala中执行此操作?

要将
矩阵
转换为指定的数据帧,请执行以下操作。它首先将矩阵转换为一个数据帧,其中包含一列和一个数组。然后使用
foldLeft
将数组拆分为单独的列

import spark.implicits._
val cols = (0 until lm.numCols).toSeq

val df = lm.transpose
  .colIter.toSeq
  .map(_.toArray)
  .toDF("arr")

val df2 = cols.foldLeft(df)((df, i) => df.withColumn("_" + (i+1), $"arr"(i)))
  .drop("arr")