Scala简单直方图
例如,对于给定的Scala简单直方图,scala,histogram,scala-collections,Scala,Histogram,Scala Collections,例如,对于给定的数组[Double] val a = Array.tabulate(100){ _ => Random.nextDouble * 10 } 使用nbin计算直方图的简单方法是什么 这个怎么样 object Hist { type Bins = Map[Double, List[Double]] // artificially increasing bucket length to overcome last-point issue privat
数组[Double]
val a = Array.tabulate(100){ _ => Random.nextDouble * 10 }
使用n
bin计算直方图的简单方法是什么 这个怎么样
object Hist {
type Bins = Map[Double, List[Double]]
// artificially increasing bucket length to overcome last-point issue
private val Epsilon = 0.000001
def histogram(data: List[Double], binsCount: Int) = {
require(data.length > binsCount)
val sorted = data.sorted
val min = sorted.head
val max = sorted.last
val binLength = (max - min) / binsCount + Epsilon
val bins = Map.empty[Double, List[Double]].withDefaultValue(Nil)
scatterToBins(sorted, min + binLength, binLength, bins)
}
@annotation.tailrec
private def scatterToBins(xs: List[Double], upperBound: Double, binLength: Double, bins: Bins): Bins = xs match {
case Nil => bins
case point::tail =>
val bound = if (point < upperBound) upperBound else upperBound + binLength
val currentBin = bins(bound)
val newBin = point::currentBin
scatterToBins(tail, bound, binLength, bins + (bound -> newBin))
}
// now let's test this out
val data = Array.tabulate(100){ _ => scala.util.Random.nextDouble * 10 }
val result = histogram(data.toList, 5)
val pointsPerBucket = result.values.map(xs => xs.length)
}
我用列表而不是数组欺骗了一些人,但我希望这对你没问题。一种与@om-nom-nom的答案中非常相似的值准备方法,但是使用分区的直方图方法非常小
case class Distribution(nBins: Int, data: List[Double]) {
require(data.length > nBins)
val Epsilon = 0.000001
val (max,min) = (data.max,data.min)
val binWidth = (max - min) / nBins + Epsilon
val bounds = (1 to nBins).map { x => min + binWidth * x }.toList
def histo(bounds: List[Double], data: List[Double]): List[List[Double]] =
bounds match {
case h :: Nil => List(data)
case h :: t => val (l,r) = data.partition( _ < h) ; l :: histo(t,r)
}
val histogram = histo(bounds, data)
}
诸如此类
val tabulated = h.map {_.size}
这个怎么样:
val num_bins = 20
val mx = a.max.toDouble
val mn = a.min.toDouble
val hist = a
.map(x=>(((x.toDouble-mn)/(mx-mn))*num_bins).floor.toInt)
.groupBy(x=>x)
.map(x=>x._1->x._2.size)
.toSeq
.sortBy(x=>x._1)
.map(x=>x._2)
另一个答案,在我看来更简洁
def mkHistogram(n_bins: Int, lowerUpperBound: Option[(Double, Double)] = None)(xs: Seq[Double]) = {
val (mn, mx) = lowerUpperBound getOrElse(xs.min, xs.max)
val epsilon = 0.0001
val binSize = (mx - mn) / n_bins * (1 + epsilon)
val bins = (0 to n_bins).map(mn + _ * binSize).sliding(2).map(xs => (xs(0), xs(1)))
def binContains(bin:(Double,Double),x: Double) = (x >= bin._1) && (x < bin._2)
bins.map(bin => (bin, xs.count(binContains(bin,_))))
}
@ mkHistogram(5,Option(0,10))(Seq(1,1,1,1,2,2,2,3,4,5,6,7)).foreach(println)
((0.0,2.0002),7)
((2.0002,4.0004),2)
((4.0004,6.0006),2)
((6.0006,8.0008),1)
((8.0008,10.001),0)
def mkHistogram(n_bins:Int,loweruperbound:Option[(Double,Double)]=None)(xs:Seq[Double]){
val(mn,mx)=下限上限getOrElse(xs.min,xs.max)
valε=0.0001
val binSize=(mx-mn)/n_bins*(1+ε)
val bins=(0到n个bins).map(mn+*binSize).slideing(2).map(xs=>(xs(0),xs(1)))
def binContains(bin:(Double,Double),x:Double)=(x>=bin.\u 1)和&(x(bin,xs.count(binContains,bin,)))
}
@Mk直方图(5,选项(0,10))(序号(1,1,1,2,2,3,4,5,6,7)).foreach(println)
((0.0,2.0002),7)
((2.0002,4.0004),2)
((4.0004,6.0006),2)
((6.0006,8.0008),1)
((8.0008,10.001),0)
我有一个类似但略有不同的要求——根据用户定义的箱子/截止值制作直方图。比如,在OP的例子中,需要0-3、-4、-5、-6、-7,8+个箱子。我尝试了几种方法,但我的突破是认识到我需要根据每个值所插入的bin中的位置对数组进行分组:
val a = Array.tabulate(100){ _ => Random.nextDouble * 10 }
val bins=List(3,4,5,6,7,8,Int.MaxValue) //-- user-defined cutoff values (with max value at the top)
a.groupBy(i => bins.indexWhere(_>i)) //-- collection of lists fitting this criteria
.map{case (i,items) => i -> items.length} //-- map for index to number of items in that index's list
在这种情况下,结果是:
Map(0 -> 26, 5 -> 7, 1 -> 5, 6 -> 24, 2 -> 12, 3 -> 15, 4 -> 11)
val a = Array.tabulate(100){ _ => Random.nextDouble * 10 }
val bins=List(3,4,5,6,7,8,Int.MaxValue) //-- user-defined cutoff values (with max value at the top)
a.groupBy(i => bins.indexWhere(_>i)) //-- collection of lists fitting this criteria
.map{case (i,items) => i -> items.length} //-- map for index to number of items in that index's list
Map(0 -> 26, 5 -> 7, 1 -> 5, 6 -> 24, 2 -> 12, 3 -> 15, 4 -> 11)