向量数组按字段之一求和-scala_Scala_Vector_Mahout

向量数组按字段之一求和-scala

scala vector

向量数组按字段之一求和-scala,scala,vector,mahout,Scala,Vector,Mahout,我在scala中有一个向量数组： import org.apache.mahout.math.{ VectorWritable, Vector, DenseVector } import org.apache.mahout.clustering.dirichlet.UncommonDistributions val data = new ArrayBuffer[Vector]() for (i <- 100 to num) { data += new De

我在scala中有一个向量数组：

import org.apache.mahout.math.{ VectorWritable, Vector, DenseVector }
import org.apache.mahout.clustering.dirichlet.UncommonDistributions

     val data = new ArrayBuffer[Vector]()
     for (i <- 100 to num) {
      data += new DenseVector(Array[Double](

      i % 30,  

      UncommonDistributions.rNorm(100, 100),

      UncommonDistributions.rNorm(100, 100)
      )



 }

import org.apache.mahout.math.{VectorWritable，Vector，DenseVector}
导入org.apache.mahout.clustering.dirichlet.UncommonDistributions
val数据=新阵列缓冲[向量]（）
对于（i我建议使用收藏中的groupBy方法：

这将根据您指定的鉴别器创建向量映射
编辑：一些代码示例：
// I created a different Array of Vector as I don't have Mahout dependencies
// But the output is similar
// A List of Vectors with 3 values inside
val num = 100
val data = (0 to num).toList.map(n => {
  Vector(n % 30, n / 100, n * 100)
})

// The groupBy will create a Map of Vectors where the Key is the result of the function
// And here, the function return the first value of the Vector
val group = data.groupBy(v => { v.apply(0) })

// Also a subset of the result:
// group:
// scala.collection.immutable.Map[Int,List[scala.collection.immutable.Vector[Int]]] = Map(0 -> List(Vector(0, 0, 0), Vector(0, 0, 3000), Vector(0, 0, 6000), Vector(0, 0, 9000)), 5 -> List(Vector(5, 0, 500), Vector(5, 0, 3500), Vector(5, 0, 6500), Vector(5, 0, 9500)))

在列表中使用groupBy函数，然后映射每个组-只需一行代码：
 data groupBy (_(0)) map { case (k,v) => k -> (v map (_(2)) sum) }

谢谢，但是现在，我该如何求和呢？您需要将映射的每个值折叠成一个向量，将第二个和第三个值相加。目标是得到一个[Double，Vector]的映射。如果你不介意的话，我会让你自己试试，如果你不成功，我会给你举个例子。这是我对desc-group.mapValues（u.foldLeft（0）（u+2）））进行求和排序的结果
你有更好的解决方案吗？是的，但所有的解决方案似乎都有点笨拙-先转换为映射值，然后求和，然后转换为列表，然后排序…这不是一个较短的解决方案吗？在集合中有一个求和
方法可以派上用场。但是，你需要彻底改变输入的格式才能使用它（可能使用元组而不是向量）可能不是您想要做的。