Algorithm 表示连续概率分布
我有一个问题,涉及到一系列连续概率分布函数,其中大部分是根据经验确定的(例如出发时间、过境时间)。我需要的是某种方法,取其中两个PDF并对其进行算术运算。例如,如果我有两个值x取自PDF x,y取自PDF y,我需要得到(x+y)的PDF或任何其他运算f(x,y) 解析解是不可能的,所以我要寻找的是PDF的一些表示形式,它允许这样的事情。一个明显的(但计算代价昂贵的)解决方案是蒙特卡罗:生成大量的x和y值,然后只测量f(x,y)。但这需要太多的CPU时间 我确实考虑过将PDF表示为范围列表,其中每个范围的概率大致相等,有效地将PDF表示为均匀分布列表的并集。但我不知道如何把它们结合起来 有没有人能很好地解决这个问题 编辑:目标是创建一种用于操作PDF的迷你语言(又名领域专用语言)。但首先,我需要整理底层的表示和算法 编辑2:dmckee建议使用直方图实现。这就是我的均匀分布列表的意思。但我不知道如何将它们结合起来创建新的发行版。最终,我需要找到像P(xAlgorithm 表示连续概率分布,algorithm,math,haskell,statistics,probability,Algorithm,Math,Haskell,Statistics,Probability,我有一个问题,涉及到一系列连续概率分布函数,其中大部分是根据经验确定的(例如出发时间、过境时间)。我需要的是某种方法,取其中两个PDF并对其进行算术运算。例如,如果我有两个值x取自PDF x,y取自PDF y,我需要得到(x+y)的PDF或任何其他运算f(x,y) 解析解是不可能的,所以我要寻找的是PDF的一些表示形式,它允许这样的事情。一个明显的(但计算代价昂贵的)解决方案是蒙特卡罗:生成大量的x和y值,然后只测量f(x,y)。但这需要太多的CPU时间 我确实考虑过将PDF表示为范围列表,其中
-- | Continuous distributions of scalars are represented as a
-- | histogram where each bar has approximately constant area but
-- | variable width and height. A histogram with N bars is stored as
-- | a list of N+1 values.
data Continuous = C {
cN :: Int,
-- ^ Number of bars in the histogram.
cAreas :: [Double],
-- ^ Areas of the bars. @length cAreas == cN@
cBars :: [Double]
-- ^ Boundaries of the bars. @length cBars == cN + 1@
} deriving (Show, Read)
{- | Add distributions. If two random variables @vX@ and @vY@ are
taken from distributions @x@ and @y@ respectively then the
distribution of @(vX + vY)@ will be @(x .+. y).
This is implemented as the convolution of distributions x and y.
Each is a histogram, which is to say the sum of a collection of
uniform distributions (the "bars"). Therefore the convolution can be
computed as the sum of the convolutions of the cross product of the
components of x and y.
When you convolve two uniform distributions of unequal size you get a
trapezoidal distribution. Let p = p2-p1, q - q2-q1. Then we get:
> | |
> | ______ |
> | | | with | _____________
> | | | | | |
> +-----+----+------- +--+-----------+-
> p1 p2 q1 q2
>
> gives h|....... _______________
> | /: :\
> | / : : \ 1
> | / : : \ where h = -
> | / : : \ q
> | / : : \
> +--+-----+-------------+-----+-----
> p1+q1 p2+q1 p1+q2 p2+q2
However we cannot keep the trapezoid in the final result because our
representation is restricted to uniform distributions. So instead we
store a uniform approximation to the trapezoid with the same area:
> h|......___________________
> | | / \ |
> | |/ \|
> | | |
> | /| |\
> | / | | \
> +-----+-------------------+--------
> p1+q1+p/2 p2+q2-p/2
-}
(.+.) :: Continuous -> Continuous -> Continuous
c .+. d = C {cN = length bars - 1,
cBars = map fst bars,
cAreas = zipWith barArea bars (tail bars)}
where
-- The convolve function returns a list of two (x, deltaY) pairs.
-- These can be sorted by x and then sequentially summed to get
-- the new histogram. The "b" parameter is the product of the
-- height of the input bars, which was omitted from the diagrams
-- above.
convolve b c1 c2 d1 d2 =
if (c2-c1) < (d2-d1) then convolve1 b c1 c2 d1 d2 else convolve1 b d1
d2 c1 c2
convolve1 b p1 p2 q1 q2 =
[(p1+q1+halfP, h), (p2+q2-halfP, (-h))]
where
halfP = (p2-p1)/2
h = b / (q2-q1)
outline = map sumGroup $ groupBy ((==) `on` fst) $ sortBy (comparing fst)
$ concat
[convolve (areaC*areaD) c1 c2 d1 d2 |
(c1, c2, areaC) <- zip3 (cBars c) (tail $ cBars c) (cAreas c),
(d1, d2, areaD) <- zip3 (cBars d) (tail $ cBars d) (cAreas d)
]
sumGroup pairs = (fst $ head pairs, sum $ map snd pairs)
bars = tail $ scanl (\(_,y) (x2,dy) -> (x2, y+dy)) (0, 0) outline
barArea (x1, h) (x2, _) = (x2 - x1) * h
——|标量的连续分布表示为
--|柱状图,其中每个条具有近似恒定的面积,但
--|可变宽度和高度。带有N条的直方图存储为
--| N+1个值的列表。
数据连续=C{
cN::Int,
--^直方图中的条数。
cAreas::[Double],
--^条的面积。@length cAreas==cN@
cBars::[双精度]
--^条的边界。@length cBars==cN+1@
}派生(显示、阅读)
{-|添加分布。如果两个随机变量@vX@和@vY@为
分别取自分布@x@和@y@然后
@(vX+vY)的分布将是@(x++.y)。
这被实现为分布x和y的卷积。
每个都是一个柱状图,也就是说一组数据的总和
均匀分布(“条”)。因此卷积可以
计算为函数的叉积的卷积之和
x和y的分量。
当你卷积两个大小不等的均匀分布时,你会得到一个
梯形分布。设p=p2-p1,q-q2-q1。然后我们得到:
> | |
> | ______ |
>| | |带|_____________
> | | | | | |
> +-----+----+------- +--+-----------+-
>p1 p2 q1 q2
>
>给出h |_______________
> | /: :\
> | / : : \ 1
>|/:\其中h=-
>|/:\q
> | / : : \
> +--+-----+-------------+-----+-----
>p1+q1 p2+q1 p1+q2 p2+q2
然而,我们不能在最终结果中保留梯形,因为我们的
表示仅限于均匀分布,因此我们
存储具有相同面积的梯形的统一近似值:
>h |___________________
> | | / \ |
> | |/ \|
> | | |
> | /| |\
> | / | | \
> +-----+-------------------+--------
>p1+q1+p/2 p2+q2-p/2
-}
(.+)::连续->连续->连续
c.+。d=C{cN=长度钢筋-1,
cBars=地图fst条,
cAreas=带栏杆(尾杆)的拉链]
哪里
--卷积函数返回两个(x,deltaY)对的列表。
--这些可以按x排序,然后按顺序求和得到
--新的直方图。“b”参数是
--输入条的高度,
struct histogram_struct {
int bins; /* Assumed to be uniform */
double low;
double high;
/* double normalization; */
/* double *errors; */ /* if using, intialize with enough space,
* and store _squared_ errors
*/
double contents[];
};
mean :: Measure -> Double
mean mu = mu id
variance :: Measure -> Double
variance mu = (mu $ \x -> x ^ 2) - (mean mu) ^ 2
cdf :: Measure -> Double -> Double
cdf mu x = mu $ \z -> if z < x then 1 else 0
empirical :: [Double] -> Measure
empirical h:t f = (f h) + empirical t f
from_pdf :: (Double -> Double) -> Measure
from_pdf rho f = my_favorite_quadrature_method rho f
(mu ** nu) f = nu $ \y -> (mu $ \x -> f $ x + y)
rescale :: Double -> Measure -> Measure
rescale a mu f = mu $ \x -> f(a * x)
apply :: (Double -> Double) -> Measure -> Measure
apply phi mu f = mu $ f . phi
newtype Measure a = (a -> Double) -> Double
instance Functor Measure a where
fmap f mu = apply f mu
m = mean $ apply cos ((from_pdf gauss) ** (empirical data))