Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/17.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
向量数组按字段之一求和-scala_Scala_Vector_Mahout - Fatal编程技术网

向量数组按字段之一求和-scala

向量数组按字段之一求和-scala,scala,vector,mahout,Scala,Vector,Mahout,我在scala中有一个向量数组: import org.apache.mahout.math.{ VectorWritable, Vector, DenseVector } import org.apache.mahout.clustering.dirichlet.UncommonDistributions val data = new ArrayBuffer[Vector]() for (i <- 100 to num) { data += new De

我在scala中有一个向量数组:

import org.apache.mahout.math.{ VectorWritable, Vector, DenseVector }
import org.apache.mahout.clustering.dirichlet.UncommonDistributions

     val data = new ArrayBuffer[Vector]()
     for (i <- 100 to num) {
      data += new DenseVector(Array[Double](

      i % 30,  

      UncommonDistributions.rNorm(100, 100),

      UncommonDistributions.rNorm(100, 100)
      )



 }
import org.apache.mahout.math.{VectorWritable,Vector,DenseVector}
导入org.apache.mahout.clustering.dirichlet.UncommonDistributions
val数据=新阵列缓冲[向量]()

对于(i我建议使用收藏中的groupBy方法:

这将根据您指定的鉴别器创建向量映射

编辑:一些代码示例:

// I created a different Array of Vector as I don't have Mahout dependencies
// But the output is similar
// A List of Vectors with 3 values inside
val num = 100
val data = (0 to num).toList.map(n => {
  Vector(n % 30, n / 100, n * 100)
})

// The groupBy will create a Map of Vectors where the Key is the result of the function
// And here, the function return the first value of the Vector
val group = data.groupBy(v => { v.apply(0) })

// Also a subset of the result:
// group:
// scala.collection.immutable.Map[Int,List[scala.collection.immutable.Vector[Int]]] = Map(0 -> List(Vector(0, 0, 0), Vector(0, 0, 3000), Vector(0, 0, 6000), Vector(0, 0, 9000)), 5 -> List(Vector(5, 0, 500), Vector(5, 0, 3500), Vector(5, 0, 6500), Vector(5, 0, 9500)))

在列表中使用groupBy函数,然后映射每个组-只需一行代码:

 data groupBy (_(0)) map { case (k,v) => k -> (v map (_(2)) sum) }

谢谢,但是现在,我该如何求和呢?您需要将映射的每个值折叠成一个向量,将第二个和第三个值相加。目标是得到一个[Double,Vector]的映射。如果你不介意的话,我会让你自己试试,如果你不成功,我会给你举个例子。这是我对desc-
group.mapValues(u.foldLeft(0)(u+2)))进行求和排序的结果
你有更好的解决方案吗?是的,但所有的解决方案似乎都有点笨拙-先转换为映射值,然后求和,然后转换为列表,然后排序…这不是一个较短的解决方案吗?在集合中有一个
求和
方法可以派上用场。但是,你需要彻底改变输入的格式才能使用它(可能使用元组而不是向量)可能不是您想要做的。