Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala 如何将列或向量序列转换为SparseMatrix?_Scala_Apache Spark_Matrix_Sparse Matrix - Fatal编程技术网

Scala 如何将列或向量序列转换为SparseMatrix?

Scala 如何将列或向量序列转换为SparseMatrix?,scala,apache-spark,matrix,sparse-matrix,Scala,Apache Spark,Matrix,Sparse Matrix,正如标题所说,我有一个向量序列(在DataFrame列中,但可以使用.collect()将其转换为RDD或序列)。我想把这些向量收集到一个局部稀疏矩阵中。为了支持与Spark 1.6.3的兼容性,我需要将其作为的mllib版本 作为一个序列的sparsevector收集,我得到 val seq_of_vectors = df_with_vectors.select("sparse").map(_.getAs[SparseVector](0)).collect() seq_of_vectors:

正如标题所说,我有一个向量序列(在DataFrame列中,但可以使用.collect()将其转换为RDD或序列)。我想把这些向量收集到一个局部稀疏矩阵中。为了支持与Spark 1.6.3的兼容性,我需要将其作为的mllib版本

作为一个序列的sparsevector收集,我得到

val seq_of_vectors = df_with_vectors.select("sparse").map(_.getAs[SparseVector](0)).collect()
seq_of_vectors: Array[org.apache.spark.mllib.linalg.SparseVector] = ...
我可以很容易地生成行矩阵,但我也看不到任何将行矩阵转换为局部矩阵的方法

val exampleMatrix = new RowMatrix(df_with_vectors.select("sparse").rdd.map(_.getAs[SparseVector](0)))
exampleMatrix: org.apache.spark.mllib.linalg.distributed.RowMatrix = org.apache.spark.mllib.linalg.distributed.RowMatrix@2e6273dc

给定表单中的SparseVector对象序列

seq_of_vectors: Array[org.apache.spark.mllib.linalg.SparseVector] = 
    Array(..., (262144,[136034,155107,166596],[0.8164965809277259,0.40824829046386296,0.40824829046386296]), ...
我们使用以下方法将(行、列、值)转换为坐标列表元组:

然后我们使用
SparseMatrix
fromCOO
函数。行数是传递的向量数;列数是最长SparseVector的长度:

SparseMatrix.fromCOO(seq_of_vectors.length,
    seq_of_vectors.map(_.size).max,
    coo)

res223: org.apache.spark.mllib.linalg.SparseMatrix = 
84 x 262144 CSCMatrix
...
(28,136034) 0.8164965809277259
...
(28,155107) 0.40824829046386296
...
(28,166596) 0.40824829046386296
...
SparseMatrix.fromCOO(seq_of_vectors.length,
    seq_of_vectors.map(_.size).max,
    coo)

res223: org.apache.spark.mllib.linalg.SparseMatrix = 
84 x 262144 CSCMatrix
...
(28,136034) 0.8164965809277259
...
(28,155107) 0.40824829046386296
...
(28,166596) 0.40824829046386296
...