Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/performance/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Performance Scala有效地将Seq[A]转换为频率映射[A,Int]_Performance_Scala - Fatal编程技术网

Performance Scala有效地将Seq[A]转换为频率映射[A,Int]

Performance Scala有效地将Seq[A]转换为频率映射[A,Int],performance,scala,Performance,Scala,我实现了该函数的四个正确版本。我想要一个半惯用的Scala版本,它运行得更快,更符合Java类实现 groupByAndCount:最干净、最优雅。不幸的是,速度很慢 foldimutable:完全内部不可变。跑得更慢 iterateTomatible:简单可变版本。还是慢 iterateToJavaMutable:使用Java(mutable)HashMap,它提供了一个计算函数,因此代码可以避免每个元素迭代使用单独的get/set函数 fixedTypeLongCustomMap:这是一个自

我实现了该函数的四个正确版本。我想要一个半惯用的Scala版本,它运行得更快,更符合Java类实现

groupByAndCount
:最干净、最优雅。不幸的是,速度很慢

foldimutable
:完全内部不可变。跑得更慢

iterateTomatible
:简单可变版本。还是慢

iterateToJavaMutable
:使用Java(mutable)HashMap,它提供了一个计算函数,因此代码可以避免每个元素迭代使用单独的get/set函数

fixedTypeLongCustomMap
:这是一个自定义的非泛型集合
it.unimi.dsi.fastutil.longs.Long2IntOpenHashMap
,运行速度最快

以下是一些jmh基准:

Benchmark                                     Mode  Cnt  Score   Error  Units
FreqMapGenerationJava.fixedTypeLongCustomMap  avgt    5  0.255 ± 0.061   s/op
FreqMapGenerationJava.foldImmutable           avgt    5  3.728 ± 0.318   s/op
FreqMapGenerationJava.groupByAndCount         avgt    5  1.315 ± 0.405   s/op
FreqMapGenerationJava.iterateToJavaMutable    avgt    5  0.654 ± 0.080   s/op
FreqMapGenerationJava.iterateToMutable        avgt    5  1.356 ± 0.240   s/op
下面是完整的Scala代码:

  def foldImmutable[A](l: Seq[A]): immutable.Map[A, Int] = {
    def foldF(m: immutable.Map[A, Int], a: A): immutable.Map[A, Int] = {
      m + (a -> (m.getOrElse(a, 0) + 1))
    }

    l.foldLeft(immutable.Map[A, Int]())(foldF)
  }

  def groupByAndCount[A](l: Seq[A]): immutable.Map[A, Int] =
    l.groupBy(x => x).mapValues(l => l.size)

  def iterateToMutable[A](l: Seq[A]): mutable.Map[A, Int] = {
    val m = mutable.Map[A, Int]()
    for (a <- l) {
      m(a) = m.getOrElse(a, 0) + 1
    }
    m
  }

  def iterateToJavaMutable[A](l: Seq[A]): java.util.Map[A, Int] = {
    val m = new java.util.HashMap[A, Int]()
    for (a <- l) {
      m.compute(a, (k, v) => if (v == null) 1 else v + 1)
    }
    m
  }

  def fixedTypeLongCustomMap(l: Seq[Long]): Long2IntOpenHashMap = {
    val m = new Long2IntOpenHashMap
    for (a <- l) {
      m.addTo(a, 1)
    }
    m
  }
def foldimutable[A](l:Seq[A]):不可变的.Map[A,Int]={
def foldF(m:immutable.Map[A,Int],A:A):immutable.Map[A,Int]={
m+(a->(m.getOrElse(a,0)+1))
}
l、 foldLeft(immutable.Map[A,Int]())(foldF)
}
def groupByAndCount[A](l:Seq[A]):不可变的.Map[A,Int]=
l、 groupBy(x=>x).mapValues(l=>l.size)
def iterateToMutable[A](l:Seq[A]):mutable.Map[A,Int]={
val m=mutable.Map[A,Int]()

对于(a您是否在可变集合和不可变集合上都尝试了Scala标准库
groupBy

关于不可变Seq

scala> Seq(1, 2, 2, 2, 1, 1, 1, 4, 4).groupBy(identity).mapValues(_.size)
res16: scala.collection.immutable.Map[Int,Int] = Map(2 -> 3, 4 -> 2, 1 -> 4)
关于可变列表缓冲区

scala> ListBuffer(1, 2, 1, 2).groupBy(identity).mapValues(_.size)
res17: scala.collection.immutable.Map[Int,Int] = Map(2 -> 2, 1 -> 2)
选中此项:

def foldImmutableAggregate[A](l: Seq[A]) : Map[A, Int] = {
  l.aggregate(Map[A, Int]())({ (sum, ch) => sum ++ Map(ch -> (sum.getOrElse(ch, 0) + 1)) }, (a, b) => a ++ b) 
}


为了成批处理Seq

我在OP中添加了
groupByAndCount
,并使用了基准测试。它实际上运行得比我想象的快,但仍然太慢。你是如何进行基准测试的?我在最初的帖子中说,jmh:openjdk.java.net/projects/code-tools/jmh/
def foldImmutableAggregate[A](l: Seq[A]) : Map[A, Int] = {
  l.par.aggregate(Map[A, Int]())({ (sum, ch) => sum ++ Map(ch -> (sum.getOrElse(ch, 0) + 1)) }, (a, b) => a ++ b) 
}