Scala Spark中的简单矩阵乘法

Scala Spark中的简单矩阵乘法,scala,matrix,apache-spark,Scala,Matrix,Apache Spark,我正在努力学习一些非常基本的spark代码。我想定义一个包含两列的矩阵x。这就是我尝试过的: scala> val s = breeze.linalg.linspace(-3,3,5) s: breeze.linalg.DenseVector[Double] = DenseVector(-3.0, -1.5, 0.0, 1.5, 3.0) // in this case I want s to be both column 1 and column 2 of x scala> va

我正在努力学习一些非常基本的spark代码。我想定义一个包含两列的矩阵
x
。这就是我尝试过的:

scala> val s = breeze.linalg.linspace(-3,3,5)
s: breeze.linalg.DenseVector[Double] = DenseVector(-3.0, -1.5, 0.0, 1.5, 3.0) // in this case I want s to be both column 1 and column 2 of x

scala> val ss = s.toArray ++ s.toArray
ss: Array[Double] = Array(-3.0, -1.5, 0.0, 1.5, 3.0, -3.0, -1.5, 0.0, 1.5, 3.0)

scala> import org.apache.spark.mllib.linalg.distributed.RowMatrix
import org.apache.spark.mllib.linalg.distributed.RowMatrix

scala> val mat = new RowMatrix(ss, 5, 2)
<console>:17: error: type mismatch;
 found   : Array[Double]
 required: org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector]
       val mat = new RowMatrix(ss, 5, 2)
scala>val s=breeze.linalg.linspace(-3,3,5)
s:breeze.linalg.DenseVector[Double]=DenseVector(-3.0,-1.5,0.0,1.5,3.0)//在这种情况下,我希望s同时是x的第1列和第2列
scala>val ss=s.toArray++s.toArray
ss:Array[Double]=数组(-3.0,-1.5,0.0,1.5,3.0,-3.0,-1.5,0.0,1.5,3.0)
scala>import org.apache.spark.mllib.linalg.distributed.RowMatrix
导入org.apache.spark.mllib.linalg.distributed.RowMatrix
scala>val mat=新的行矩阵(ss,5,2)
:17:错误:类型不匹配;
找到:数组[双精度]
必需:org.apache.spark.rdd.rdd[org.apache.spark.mllib.linalg.Vector]
val mat=新的行矩阵(ss,5,2)
我不明白如何获得正确的转换,以便将值传递给分布式矩阵^

编辑: 也许我已经能够解决:

scala> val s = breeze.linalg.linspace(-3,3,5)
s: breeze.linalg.DenseVector[Double] = DenseVector(-3.0, -1.5, 0.0, 1.5, 3.0)

scala> val ss = s.to
toArray         toDenseMatrix   toDenseVector   toScalaVector   toString        
toVector        

scala> val ss = s.toArray ++ s.toArray
ss: Array[Double] = Array(-3.0, -1.5, 0.0, 1.5, 3.0, -3.0, -1.5, 0.0, 1.5, 3.0)

scala> val x = new breeze.linalg.Dense
DenseMatrix   DenseVector   

scala> val x = new breeze.linalg.DenseMatrix(5, 2, ss)
x: breeze.linalg.DenseMatrix[Double] = 
-3.0  -3.0  
-1.5  -1.5  
0.0   0.0   
1.5   1.5   
3.0   3.0   

scala> val xDist = sc.parallelize(x.toArray)
xDist: org.apache.spark.rdd.RDD[Double] = ParallelCollectionRDD[0] at parallelize at <console>:18
scala>val s=breeze.linalg.linspace(-3,3,5)
s:breeze.linalg.DenseVector[双]=DenseVector(-3.0,-1.5,0.0,1.5,3.0)
scala>val ss=s.to
toArray toDenseMatrix toDenseVector ToCalaVector toString
toVector
scala>val ss=s.toArray++s.toArray
ss:Array[Double]=数组(-3.0,-1.5,0.0,1.5,3.0,-3.0,-1.5,0.0,1.5,3.0)
scala>val x=新微风.linalg.density
DenseMatrix DenseVector
scala>val x=新风。linalg。DenseMatrix(5,2,ss)
x:breeze.linalg.DenseMatrix[双]=
-3.0  -3.0  
-1.5  -1.5  
0.0   0.0   
1.5   1.5   
3.0   3.0   
scala>val xDist=sc.parallelize(x.toArray)
xDist:org.apache.spark.rdd.rdd[Double]=ParallelCollectionRDD[0]位于parallelize at:18

类似这样的东西。此类型检查,但由于某些原因,不会在我的Scala工作表中运行

import org.apache.spark.mllib.linalg._
import org.apache.spark.mllib.linalg.distributed._
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.rdd.RDD

val conf = new SparkConf().setAppName("spark-scratch").setMaster("local")
val sc= new SparkContext(conf)

// the values for the column in each row
val col = List(-3.0, -1.5, 0.0, 1.5, 3.0) ;

// make two rows of the column values, transpose it,
// make Vectors of the result
val t = List(col,col).transpose.map(r=>Vectors.dense(r.toArray))

// make an RDD from the resultant sequence of Vectors, and 
// make a RowMatrix from that.
val rm = new RowMatrix(sc.makeRDD(t));

makeRDD
SparkContext
的方法)将从集合中生成一个
RDD
,因此您可能希望将
sc.makeRDD(ss)
作为
RowMatrix
的第一个参数?它还引用了一个MLlib类型,你可能想看看那一页上的例子。@RichHenry我在看那个例子,但我不明白如果我需要像linspace@Paulyout解决方案不起作用
scala>val mat=new RowMatrix(sc.makeRDD(ss),5,2):17:错误:类型不匹配;找到:Array[Double]required:Seq[org.apache.spark.mllib.linalg.Vector]val mat=new RowMatrix(sc.makeRDD(ss),5,2)
您的代码似乎有点混乱。RowMatrix需要一个包含矩阵行的RDD,每行一个向量。因此,在您的示例中,五行每行包含两列。你要传递一个10倍的数组。在
s
中,您似乎在构建一个由两列组成的向量,而不是五列,这与您最近的编辑非常相似,但您的编辑构建矩阵的效果更好,谢谢,在Scala工作表中为我编译:0.2.6.v-2_11-201501121831-8101792