为什么我的metropolis算法(mcmc)的python实现如此缓慢?

为什么我的metropolis算法(mcmc)的python实现如此缓慢?,python,performance,machine-learning,random,mcmc,Python,Performance,Machine Learning,Random,Mcmc,我试图用Python实现该算法(Metropolis Hastings算法的一个简单版本) 以下是我的实现: def Metropolis_Gaussian(p, z0, sigma, n_samples=100, burn_in=0, m=1): """ Metropolis Algorithm using a Gaussian proposal distribution. p: distribution that we want to sample from (can

我试图用Python实现该算法(Metropolis Hastings算法的一个简单版本)

以下是我的实现:

def Metropolis_Gaussian(p, z0, sigma, n_samples=100, burn_in=0, m=1):
    """
    Metropolis Algorithm using a Gaussian proposal distribution.
    p: distribution that we want to sample from (can be unnormalized)
    z0: Initial sample
    sigma: standard deviation of the proposal normal distribution.
    n_samples: number of final samples that we want to obtain.
    burn_in: number of initial samples to discard.
    m: this number is used to take every mth sample at the end
    """
    # List of samples, check feasibility of first sample and set z to first sample
    sample_list = [z0]
    _ = p(z0) 
    z = z0
    # set a counter of samples for burn-in
    n_sampled = 0

    while len(sample_list[::m]) < n_samples:
        # Sample a candidate from Normal(mu, sigma),  draw a uniform sample, find acceptance probability
        cand = np.random.normal(loc=z, scale=sigma)
        u = np.random.rand()
        try:
            prob = min(1, p(cand) / p(z))
        except (OverflowError, ValueError) as error:
            continue
        n_sampled += 1

        if prob > u:
            z = cand  # accept and make candidate the new sample

        # do not add burn-in samples
        if n_sampled > burn_in:
            sample_list.append(z)

    # Finally want to take every Mth sample in order to achieve independence
    return np.array(sample_list)[::m]
这段代码需要相当长的时间来运行,我不知道为什么。在我的Metropolis_Gaussian代码中,我试图通过

  • 不向列表中添加重复样本
  • 不记录老化样品

  • 函数
    pdf\t
    的定义如下

    from scipy.stats import t
    def pdf_t(x, df=10):
        return t.pdf(x, df=df)
    
    我回答了一个问题。我在这里提到的很多东西(不是每次迭代都计算当前的可能性,预先计算随机创新等)都可以在这里使用

    实现的其他改进是不使用列表来存储示例。相反,您应该为样本预先分配内存,并将它们存储为数组。类似于
    samples=np的东西。零(n_samples)
    比在每次迭代时附加到列表更有效

    您已经提到,您试图通过不记录老化样本来提高效率。这是个好主意。您还可以通过只记录每个第m个样本来实现类似的细化,因为您在返回语句中使用
    np.array(sample_list)[::m]
    丢弃了这些样本。您可以通过更改以下内容来执行此操作:

       # do not add burn-in samples
        if n_sampled > burn_in:
            sample_list.append(z)
    

    还值得注意的是,您不需要计算
    min(1,p(cand)/p(z))
    ,只需计算
    p(cand)/p(z)
    。我意识到在形式上,
    min
    是必要的(以确保概率在0和1之间)。但是,在计算上,我们不需要最小值,因为如果
    p(cand)/p(z)>1
    ,那么
    p(cand)/p(z)
    总是大于
    u

    将这一切加在一起,以及预先计算随机创新、接受概率
    u
    ,并仅在您确实需要时计算可能性,我得出:

    def my_Metropolis_Gaussian(p, z0, sigma, n_samples=100, burn_in=0, m=1):
        # Pre-allocate memory for samples (much more efficient than using append)
        samples = np.zeros(n_samples)
    
        # Store initial value
        samples[0] = z0
        z = z0
        # Compute the current likelihood
        l_cur = p(z)
    
        # Counter
        iter = 0
        # Total number of iterations to make to achieve desired number of samples
        iters = (n_samples * m) + burn_in
    
        # Sample outside the for loop
        innov = np.random.normal(loc=0, scale=sigma, size=iters)
        u = np.random.rand(iters)
    
        while iter < iters:
            # Random walk innovation on z
            cand = z + innov[iter]
    
            # Compute candidate likelihood
            l_cand = p(cand)
    
            # Accept or reject candidate
            if l_cand / l_cur > u[iter]:
                z = cand
                l_cur = l_cand
    
            # Only keep iterations after burn-in and for every m-th iteration
            if iter > burn_in and iter % m == 0:
                samples[(iter - burn_in) // m] = z
    
            iter += 1
    
        return samples
    
    我回答了一个问题。我在这里提到的很多东西(不是每次迭代都计算当前的可能性,预先计算随机创新等)都可以在这里使用

    实现的其他改进是不使用列表来存储示例。相反,您应该为样本预先分配内存,并将它们存储为数组。类似于
    samples=np的东西。零(n_samples)
    比在每次迭代时附加到列表更有效

    您已经提到,您试图通过不记录老化样本来提高效率。这是个好主意。您还可以通过只记录每个第m个样本来实现类似的细化,因为您在返回语句中使用
    np.array(sample_list)[::m]
    丢弃了这些样本。您可以通过更改以下内容来执行此操作:

       # do not add burn-in samples
        if n_sampled > burn_in:
            sample_list.append(z)
    

    还值得注意的是,您不需要计算
    min(1,p(cand)/p(z))
    ,只需计算
    p(cand)/p(z)
    。我意识到在形式上,
    min
    是必要的(以确保概率在0和1之间)。但是,在计算上,我们不需要最小值,因为如果
    p(cand)/p(z)>1
    ,那么
    p(cand)/p(z)
    总是大于
    u

    将这一切加在一起,以及预先计算随机创新、接受概率
    u
    ,并仅在您确实需要时计算可能性,我得出:

    def my_Metropolis_Gaussian(p, z0, sigma, n_samples=100, burn_in=0, m=1):
        # Pre-allocate memory for samples (much more efficient than using append)
        samples = np.zeros(n_samples)
    
        # Store initial value
        samples[0] = z0
        z = z0
        # Compute the current likelihood
        l_cur = p(z)
    
        # Counter
        iter = 0
        # Total number of iterations to make to achieve desired number of samples
        iters = (n_samples * m) + burn_in
    
        # Sample outside the for loop
        innov = np.random.normal(loc=0, scale=sigma, size=iters)
        u = np.random.rand(iters)
    
        while iter < iters:
            # Random walk innovation on z
            cand = z + innov[iter]
    
            # Compute candidate likelihood
            l_cand = p(cand)
    
            # Accept or reject candidate
            if l_cand / l_cur > u[iter]:
                z = cand
                l_cur = l_cand
    
            # Only keep iterations after burn-in and for every m-th iteration
            if iter > burn_in and iter % m == 0:
                samples[(iter - burn_in) // m] = z
    
            iter += 1
    
        return samples
    

    已在此网站上询问了一个问题。虽然标题听起来可能不是同一个问题,但我给您的答案与此处相同:。我在此再次强调,不包括失败接受的重复是渐进不正确的,并且会导致低可能性样本值的过度代表。本网站已经询问了A的可能重复。虽然标题听起来可能不像是同一个问题,但我给你的答案与此处相同:。我在这里再次强调,不包括失败接受的重复是渐进错误的,并且会导致低可能性样本值的过度代表性
    In [1]: %timeit Metropolis_Gaussian(pdf_t, 3, 1, n_samples=100, burn_in=100, m=10)
    205 ms ± 2.16 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    
    In [2]: %timeit my_Metropolis_Gaussian(pdf_t, 3, 1, n_samples=100, burn_in=100, m=10)
    102 ms ± 1.12 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)