Algorithm 表示连续概率分布_Algorithm_Math_Haskell_Statistics_Probability

Algorithm 表示连续概率分布

algorithm math haskell statistics

Algorithm 表示连续概率分布,algorithm,math,haskell,statistics,probability,Algorithm,Math,Haskell,Statistics,Probability,我有一个问题，涉及到一系列连续概率分布函数，其中大部分是根据经验确定的（例如出发时间、过境时间）。我需要的是某种方法，取其中两个PDF并对其进行算术运算。例如，如果我有两个值x取自PDF x，y取自PDF y，我需要得到（x+y）的PDF或任何其他运算f（x，y）解析解是不可能的，所以我要寻找的是PDF的一些表示形式，它允许这样的事情。一个明显的（但计算代价昂贵的）解决方案是蒙特卡罗：生成大量的x和y值，然后只测量f（x，y）。但这需要太多的CPU时间我确实考虑过将PDF表示为范围列表，其中

我有一个问题，涉及到一系列连续概率分布函数，其中大部分是根据经验确定的（例如出发时间、过境时间）。我需要的是某种方法，取其中两个PDF并对其进行算术运算。例如，如果我有两个值x取自PDF x，y取自PDF y，我需要得到（x+y）的PDF或任何其他运算f（x，y）

解析解是不可能的，所以我要寻找的是PDF的一些表示形式，它允许这样的事情。一个明显的（但计算代价昂贵的）解决方案是蒙特卡罗：生成大量的x和y值，然后只测量f（x，y）。但这需要太多的CPU时间

我确实考虑过将PDF表示为范围列表，其中每个范围的概率大致相等，有效地将PDF表示为均匀分布列表的并集。但我不知道如何把它们结合起来

有没有人能很好地解决这个问题

编辑：目标是创建一种用于操作PDF的迷你语言（又名领域专用语言）。但首先，我需要整理底层的表示和算法

编辑2:dmckee建议使用直方图实现。这就是我的均匀分布列表的意思。但我不知道如何将它们结合起来创建新的发行版。最终，我需要找到像P（x 编辑3:我有一堆直方图。它们不是均匀分布的，因为我是从发生率数据生成它们的，所以基本上如果我有100个样本，并且我想要直方图中的10个点，那么我会为每个条分配10个样本，并使条的宽度可变，但面积不变

我已经弄明白了，要把PDF加起来，就要把它们卷积起来，为此我已经在数学上下了功夫。当你卷积两个均匀分布时，你会得到一个包含三个部分的新分布：较宽的均匀分布仍然存在，但每边都有一个三角形，宽度与较窄的分布相同。所以如果我卷积X和Y的每个元素，我会得到一堆这些元素，都是重叠的。现在我正试图找出如何将它们相加，然后得到一个直方图，这是它的最佳近似值

我开始怀疑蒙特卡洛到底是不是个坏主意

编辑4:详细讨论了均匀分布的卷积。一般来说，你会得到一个“梯形”分布。由于直方图中的每个“列”都是均匀分布的，我曾希望通过卷积这些列并对结果求和来解决这个问题

然而，结果比输入要复杂得多，并且还包括三角形编辑5:[删除错误内容]。但是如果这些梯形近似于具有相同面积的矩形，那么你就得到了正确的答案，并且减少结果中矩形的数量看起来也很简单。这可能是我一直在寻找的解决方案

编辑6:已解决！以下是此问题的最终Haskell代码：

-- | Continuous distributions of scalars are represented as a
-- | histogram where each bar has approximately constant area but
-- | variable width and height.  A histogram with N bars is stored as
-- | a list of N+1 values.
data Continuous = C {
      cN :: Int,
      -- ^ Number of bars in the histogram.
      cAreas :: [Double],
      -- ^ Areas of the bars.  @length cAreas == cN@
      cBars :: [Double]
      -- ^ Boundaries of the bars.  @length cBars == cN + 1@
    } deriving (Show, Read)


{- | Add distributions.  If two random variables @vX@ and @vY@ are
taken from distributions @x@ and @y@ respectively then the
distribution of @(vX + vY)@ will be @(x .+. y).

This is implemented as the convolution of distributions x and y.
Each is a histogram, which is to say the sum of a collection of
uniform distributions (the "bars").  Therefore the convolution can be
computed as the sum of the convolutions of the cross product of the
components of x and y.

When you convolve two uniform distributions of unequal size you get a
trapezoidal distribution. Let p = p2-p1, q - q2-q1.  Then we get:


>   |                              |
>   |     ______                   |
>   |     |    |           with    |  _____________
>   |     |    |                   |  |           |
>   +-----+----+-------            +--+-----------+-
>         p1   p2                     q1          q2
> 
>  gives    h|....... _______________
>            |       /:             :\
>            |      / :             : \                1
>            |     /  :             :  \     where h = -
>            |    /   :             :   \              q
>            |   /    :             :    \
>            +--+-----+-------------+-----+-----
>             p1+q1  p2+q1       p1+q2   p2+q2

However we cannot keep the trapezoid in the final result because our
representation is restricted to uniform distributions.  So instead we
store a uniform approximation to the trapezoid with the same area:

>           h|......___________________
>            |     | /               \ |
>            |     |/                 \|
>            |     |                   |
>            |    /|                   |\
>            |   / |                   | \
>            +-----+-------------------+--------
>               p1+q1+p/2          p2+q2-p/2

-}
(.+.) :: Continuous -> Continuous -> Continuous
c .+. d = C {cN     = length bars - 1,
             cBars  = map fst bars, 
             cAreas = zipWith barArea bars (tail bars)}
    where
      -- The convolve function returns a list of two (x, deltaY) pairs.
      -- These can be sorted by x and then sequentially summed to get
      -- the new histogram.  The "b" parameter is the product of the
      -- height of the input bars, which was omitted from the diagrams
      -- above.
      convolve b c1 c2 d1 d2 =
          if (c2-c1) < (d2-d1) then convolve1 b c1 c2 d1 d2 else convolve1 b d1 
d2 c1 c2
      convolve1 b p1 p2 q1 q2 = 
          [(p1+q1+halfP, h), (p2+q2-halfP, (-h))]
               where 
                 halfP = (p2-p1)/2
                 h = b / (q2-q1)
      outline = map sumGroup $ groupBy ((==) `on` fst) $ sortBy (comparing fst) 
$ concat
                [convolve (areaC*areaD) c1 c2 d1 d2 |
                 (c1, c2, areaC) <- zip3 (cBars c) (tail $ cBars c) (cAreas c),
                 (d1, d2, areaD) <- zip3 (cBars d) (tail $ cBars d) (cAreas d)
                ]
      sumGroup pairs = (fst $ head pairs, sum $ map snd pairs)

      bars = tail $ scanl (\(_,y) (x2,dy) -> (x2, y+dy)) (0, 0) outline
      barArea (x1, h) (x2, _) = (x2 - x1) * h

——|标量的连续分布表示为
--|柱状图，其中每个条具有近似恒定的面积，但
--|可变宽度和高度。带有N条的直方图存储为
--| N+1个值的列表。
数据连续=C{
cN：：Int，
--^直方图中的条数。
cAreas:：[Double]，
--^条的面积。@length cAreas==cN@
cBars:：[双精度]
--^条的边界。@length cBars==cN+1@
}派生（显示、阅读）
{-|添加分布。如果两个随机变量@vX@和@vY@为
分别取自分布@x@和@y@然后
@（vX+vY）的分布将是@（x++.y）。
这被实现为分布x和y的卷积。
每个都是一个柱状图，也就是说一组数据的总和
均匀分布（“条”）。因此卷积可以
计算为函数的叉积的卷积之和
x和y的分量。
当你卷积两个大小不等的均匀分布时，你会得到一个
梯形分布。设p=p2-p1，q-q2-q1。然后我们得到：
>   |                              |
>   |     ______                   |
>| | |带|_____________
>   |     |    |                   |  |           |
>   +-----+----+-------            +--+-----------+-
>p1 p2 q1 q2
> 
>给出h |_______________
>            |       /:             :\
>            |      / :             : \                1
>|/：\其中h=-
>|/：\q
>            |   /    :             :    \
>            +--+-----+-------------+-----+-----
>p1+q1 p2+q1 p1+q2 p2+q2
然而，我们不能在最终结果中保留梯形，因为我们的
表示仅限于均匀分布，因此我们
存储具有相同面积的梯形的统一近似值：
>h |___________________
>            |     | /               \ |
>            |     |/                 \|
>            |     |                   |
>            |    /|                   |\
>            |   / |                   | \
>            +-----+-------------------+--------
>p1+q1+p/2 p2+q2-p/2
-}
（.+）：：连续->连续->连续
c.+。d=C{cN=长度钢筋-1，
cBars=地图fst条，
cAreas=带栏杆（尾杆）的拉链]
哪里
--卷积函数返回两个（x，deltaY）对的列表。
--这些可以按x排序，然后按顺序求和得到
--新的直方图。“b”参数是
--输入条的高度，
struct histogram_struct {
  int bins; /* Assumed to be uniform */
  double low;
  double high;
  /* double normalization; */    
  /* double *errors; */ /* if using, intialize with enough space, 
                         * and store _squared_ errors
                         */
  double contents[];
};

mean :: Measure -> Double
mean mu = mu id

variance :: Measure -> Double
variance mu = (mu $ \x -> x ^ 2) - (mean mu) ^ 2

cdf :: Measure -> Double -> Double
cdf mu x = mu $ \z -> if z < x then 1 else 0

empirical :: [Double] -> Measure
empirical h:t f = (f h) + empirical t f

from_pdf :: (Double -> Double) -> Measure
from_pdf rho f = my_favorite_quadrature_method rho f

(mu ** nu) f = nu $ \y -> (mu $ \x -> f $ x + y)

rescale :: Double -> Measure -> Measure
rescale a mu f = mu $ \x -> f(a * x)

apply :: (Double -> Double) -> Measure -> Measure
apply phi mu f = mu $ f . phi

newtype Measure a = (a -> Double) -> Double
instance Functor Measure a where
    fmap f mu = apply f mu

m = mean $ apply cos ((from_pdf gauss) ** (empirical data))