Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/search/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala 如何在Apache Spark中计算行矩阵的逆?_Scala_Apache Spark_Linear Algebra_Distributed Computing - Fatal编程技术网

Scala 如何在Apache Spark中计算行矩阵的逆?

Scala 如何在Apache Spark中计算行矩阵的逆?,scala,apache-spark,linear-algebra,distributed-computing,Scala,Apache Spark,Linear Algebra,Distributed Computing,我有一个X,分布矩阵,以行矩阵的形式。我正在使用Spark 1.3.0。我需要能够计算X的倒数 import org.apache.spark.mllib.linalg.{向量,向量,矩阵,奇异值分解,DenseMatrix,DenseVector} import org.apache.spark.mllib.linalg.{Vectors,Vector,Matrix,SingularValueDecomposition,DenseMatrix,DenseVector} import org.a

我有一个X,分布矩阵,以行矩阵的形式。我正在使用Spark 1.3.0。我需要能够计算X的倒数

import org.apache.spark.mllib.linalg.{向量,向量,矩阵,奇异值分解,DenseMatrix,DenseVector}
import org.apache.spark.mllib.linalg.{Vectors,Vector,Matrix,SingularValueDecomposition,DenseMatrix,DenseVector}
import org.apache.spark.mllib.linalg.distributed.RowMatrix

def computeInverse(X: RowMatrix): DenseMatrix = {
  val nCoef = X.numCols.toInt
  val svd = X.computeSVD(nCoef, computeU = true)
  if (svd.s.size < nCoef) {
    sys.error(s"RowMatrix.computeInverse called on singular matrix.")
  }

  // Create the inv diagonal matrix from S 
  val invS = DenseMatrix.diag(new DenseVector(svd.s.toArray.map(x => math.pow(x,-1))))

  // U cannot be a RowMatrix
  val U = new DenseMatrix(svd.U.numRows().toInt,svd.U.numCols().toInt,svd.U.rows.collect.flatMap(x => x.toArray))

  // If you could make V distributed, then this may be better. However its alreadly local...so maybe this is fine.
  val V = svd.V
  // inv(X) = V*inv(S)*transpose(U)  --- the U is already transposed.
  (V.multiply(invS)).multiply(U)
  }
导入org.apache.spark.mllib.linalg.distributed.RowMatrix def计算反转(X:RowMatrix):密度矩阵={ val nCoef=X.numCols.toInt val svd=X.computeSVD(nCoef,computeU=true) if(svd.s.尺寸math.pow(x,-1))) //U不能是行矩阵 val U=新密度矩阵(svd.U.numRows().toInt,svd.U.numCols().toInt,svd.U.rows.collect.flatMap(x=>x.toArray)) //如果你能使V分布,那么这可能会更好。但是它已经是本地的了…所以也许这是好的。 val V=svd.V //inv(X)=V*inv(S)*转置(U)--U已被转置。 (V)乘(invS)乘(U) }
我在使用带有选项的函数时遇到问题

conf.set("spark.sql.shuffle.partitions", "12")
RowMatrix中的行被洗牌了

这是一个对我有用的更新

import org.apache.spark.mllib.linalg.{DenseMatrix,DenseVector}
import org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix

def computeInverse(X: IndexedRowMatrix)
: DenseMatrix = 
{
  val nCoef = X.numCols.toInt
  val svd = X.computeSVD(nCoef, computeU = true)
  if (svd.s.size < nCoef) {
    sys.error(s"IndexedRowMatrix.computeInverse called on singular matrix.")
  }

  // Create the inv diagonal matrix from S 
  val invS = DenseMatrix.diag(new DenseVector(svd.s.toArray.map(x => math.pow(x, -1))))

  // U cannot be a RowMatrix
  val U = svd.U.toBlockMatrix().toLocalMatrix().multiply(DenseMatrix.eye(svd.U.numRows().toInt)).transpose

  val V = svd.V
  (V.multiply(invS)).multiply(U)
}
import org.apache.spark.mllib.linalg.{DenseMatrix,DenseVector}
导入org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix
def计算反转(X:IndexedRowMatrix)
:DenseMatrix=
{
val nCoef=X.numCols.toInt
val svd=X.computeSVD(nCoef,computeU=true)
if(svd.s.尺寸math.pow(x,-1)))
//U不能是行矩阵
val U=svd.U.toBlockMatrix().toLocalMatrix().multiply(DenseMatrix.eye(svd.U.numRows().toInt)).transpose
val V=svd.V
(V)乘(invS)乘(U)
}

X.computeSVD返回的矩阵U的维数为m X k,其中m是原始(分布式)行矩阵X的行数。人们希望m较大(可能大于k),因此,如果我们希望我们的代码扩展到真正较大的m值,则不建议在驱动程序中收集它


我想说,以下两种解决方案都存在这个缺陷。@
Alexander Kharlamov
给出的答案调用
val U=svd.U.toBlockMatrix().toLocalMatrix()
,它在驱动程序中收集矩阵。同样的情况也发生在@code>gramps_lika_Spyder(顺便说一句,你的nick rocks!!)给出的答案上,它调用
svd.U.rows.collect.flatMap(x=>x.toArray)
。我建议使用分布式矩阵乘法,如发布的Scala代码。

我在您添加的链接中没有看到任何反向计算。@Gramps_lika_Spyder该链接是关于分布式矩阵乘法,以取代本地矩阵乘法
(V.multiply(invS))。multiply(U)
在解决方案的最后一行,这样您就不需要在驱动程序中收集
U
。我认为
V
invS
不足以引起问题