Scala 如何在Apache Spark中计算行矩阵的逆？_Scala_Apache Spark_Linear Algebra_Distributed Computing

Scala 如何在Apache Spark中计算行矩阵的逆？

scala apache-spark

Scala 如何在Apache Spark中计算行矩阵的逆？,scala,apache-spark,linear-algebra,distributed-computing,Scala,Apache Spark,Linear Algebra,Distributed Computing,我有一个X，分布矩阵，以行矩阵的形式。我正在使用Spark 1.3.0。我需要能够计算X的倒数 import org.apache.spark.mllib.linalg.{向量，向量，矩阵，奇异值分解，DenseMatrix，DenseVector} import org.apache.spark.mllib.linalg.{Vectors,Vector,Matrix,SingularValueDecomposition,DenseMatrix,DenseVector} import org.a

我有一个X，分布矩阵，以行矩阵的形式。我正在使用Spark 1.3.0。我需要能够计算X的倒数

import org.apache.spark.mllib.linalg.{向量，向量，矩阵，奇异值分解，DenseMatrix，DenseVector}
import org.apache.spark.mllib.linalg.{Vectors,Vector,Matrix,SingularValueDecomposition,DenseMatrix,DenseVector}
import org.apache.spark.mllib.linalg.distributed.RowMatrix

def computeInverse(X: RowMatrix): DenseMatrix = {
  val nCoef = X.numCols.toInt
  val svd = X.computeSVD(nCoef, computeU = true)
  if (svd.s.size < nCoef) {
    sys.error(s"RowMatrix.computeInverse called on singular matrix.")
  }

  // Create the inv diagonal matrix from S 
  val invS = DenseMatrix.diag(new DenseVector(svd.s.toArray.map(x => math.pow(x,-1))))

  // U cannot be a RowMatrix
  val U = new DenseMatrix(svd.U.numRows().toInt,svd.U.numCols().toInt,svd.U.rows.collect.flatMap(x => x.toArray))

  // If you could make V distributed, then this may be better. However its alreadly local...so maybe this is fine.
  val V = svd.V
  // inv(X) = V*inv(S)*transpose(U)  --- the U is already transposed.
  (V.multiply(invS)).multiply(U)
  }

导入org.apache.spark.mllib.linalg.distributed.RowMatrix
def计算反转（X:RowMatrix）：密度矩阵={
val nCoef=X.numCols.toInt
val svd=X.computeSVD（nCoef，computeU=true）
if（svd.s.尺寸math.pow（x，-1）））
//U不能是行矩阵
val U=新密度矩阵（svd.U.numRows（）.toInt，svd.U.numCols（）.toInt，svd.U.rows.collect.flatMap（x=>x.toArray））
//如果你能使V分布，那么这可能会更好。但是它已经是本地的了…所以也许这是好的。
val V=svd.V
//inv（X）=V*inv（S）*转置（U）--U已被转置。
（V）乘（invS）乘（U）
}

我在使用带有选项的函数时遇到问题

conf.set("spark.sql.shuffle.partitions", "12")

RowMatrix中的行被洗牌了

这是一个对我有用的更新

import org.apache.spark.mllib.linalg.{DenseMatrix,DenseVector}
import org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix

def computeInverse(X: IndexedRowMatrix)
: DenseMatrix = 
{
  val nCoef = X.numCols.toInt
  val svd = X.computeSVD(nCoef, computeU = true)
  if (svd.s.size < nCoef) {
    sys.error(s"IndexedRowMatrix.computeInverse called on singular matrix.")
  }

  // Create the inv diagonal matrix from S 
  val invS = DenseMatrix.diag(new DenseVector(svd.s.toArray.map(x => math.pow(x, -1))))

  // U cannot be a RowMatrix
  val U = svd.U.toBlockMatrix().toLocalMatrix().multiply(DenseMatrix.eye(svd.U.numRows().toInt)).transpose

  val V = svd.V
  (V.multiply(invS)).multiply(U)
}

import org.apache.spark.mllib.linalg.{DenseMatrix，DenseVector}
导入org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix
def计算反转（X:IndexedRowMatrix）
：DenseMatrix=
{
val nCoef=X.numCols.toInt
val svd=X.computeSVD（nCoef，computeU=true）
if（svd.s.尺寸math.pow（x，-1）））
//U不能是行矩阵
val U=svd.U.toBlockMatrix（）.toLocalMatrix（）.multiply（DenseMatrix.eye（svd.U.numRows（）.toInt））.transpose
val V=svd.V
（V）乘（invS）乘（U）
}

X.computeSVD返回的矩阵U的维数为m X k，其中m是原始（分布式）行矩阵X的行数。人们希望m较大（可能大于k），因此，如果我们希望我们的代码扩展到真正较大的m值，则不建议在驱动程序中收集它

我想说，以下两种解决方案都存在这个缺陷。@

Alexander Kharlamov

给出的答案调用

val U=svd.U.toBlockMatrix（）.toLocalMatrix（）

，它在驱动程序中收集矩阵。同样的情况也发生在@code>gramps_lika_Spyder（顺便说一句，你的nick rocks！！）给出的答案上，它调用

svd.U.rows.collect.flatMap（x=>x.toArray）

。我建议使用分布式矩阵乘法，如发布的Scala代码。

我在您添加的链接中没有看到任何反向计算。@Gramps_lika_Spyder该链接是关于分布式矩阵乘法，以取代本地矩阵乘法

（V.multiply（invS））。multiply（U）

在解决方案的最后一行，这样您就不需要在驱动程序中收集

。我认为

和

invS

不足以引起问题