Python 多维置信区间_Python_Matplotlib_Scipy

Python 多维置信区间

python matplotlib

Python 多维置信区间,python,matplotlib,scipy,Python,Matplotlib,Scipy,我有许多元组（par1，par2），也就是从多次重复实验中获得的二维参数空间中的点我正在寻找计算和可视化置信椭圆的可能性（不确定这是否是正确的术语）。下面是我在网上找到的一个示例图，展示了我的意思：来源：blogspot.ch/2011/07/classification-and-discrimination-with.html 因此，原则上，我想我们必须将多元正态分布拟合到数据点的二维直方图中。有人能帮我吗？我想你要找的是计算我不知道它有多好，但作为一个起点，我会检查python的应用

我有许多元组（par1，par2），也就是从多次重复实验中获得的二维参数空间中的点

我正在寻找计算和可视化置信椭圆的可能性（不确定这是否是正确的术语）。下面是我在网上找到的一个示例图，展示了我的意思：

来源：blogspot.ch/2011/07/classification-and-discrimination-with.html

因此，原则上，我想我们必须将多元正态分布拟合到数据点的二维直方图中。有人能帮我吗？

我想你要找的是计算

我不知道它有多好，但作为一个起点，我会检查python的应用程序。至少，在他们的Scipy 2011演讲中，作者提到，你可以用它来确定并获得置信区间（不过你可能需要有一个数据模型）

请参阅夏尔巴谈话的内容和对应部分

HTH

听起来你只是想要点散射的2西格玛椭圆

如果是的话，考虑一下这样的事情（从本文中的一些代码来看）：

我稍微修改了上面绘制误差或置信域轮廓的一个示例。现在我认为它给出了正确的轮廓

它给出了错误的轮廓，因为它将scoreatpercentile方法应用于联合数据集（蓝色+红色点），而该方法应单独应用于每个数据集

修改后的代码如下所示：

import numpy
import scipy
import scipy.stats
import matplotlib.pyplot as plt

# generate two normally distributed 2d arrays
x1=numpy.random.multivariate_normal((100,420),[[120,80],[80,80]],400)
x2=numpy.random.multivariate_normal((140,340),[[90,-70],[-70,80]],400)

# fit a KDE to the data
pdf1=scipy.stats.kde.gaussian_kde(x1.T)
pdf2=scipy.stats.kde.gaussian_kde(x2.T)

# create a grid over which we can evaluate pdf
q,w=numpy.meshgrid(range(50,200,10), range(300,500,10))
r1=pdf1([q.flatten(),w.flatten()])
r2=pdf2([q.flatten(),w.flatten()])

# sample the pdf and find the value at the 95th percentile
s1=scipy.stats.scoreatpercentile(pdf1(pdf1.resample(1000)), 5)
s2=scipy.stats.scoreatpercentile(pdf2(pdf2.resample(1000)), 5)

# reshape back to 2d
r1.shape=(20,15)
r2.shape=(20,15)

# plot the contour at the 95th percentile
plt.contour(range(50,200,10), range(300,500,10), r1, [s1],colors='b')
plt.contour(range(50,200,10), range(300,500,10), r2, [s2],colors='r')

# scatter plot the two normal distributions
plt.scatter(x1[:,0],x1[:,1],alpha=0.3)
plt.scatter(x2[:,0],x2[:,1],c='r',alpha=0.3)

参考帖子

下面是python实现：

import numpy as np
from scipy.stats import norm, chi2

def cov_ellipse(cov, q=None, nsig=None, **kwargs):
    """
    Parameters
    ----------
    cov : (2, 2) array
        Covariance matrix.
    q : float, optional
        Confidence level, should be in (0, 1)
    nsig : int, optional
        Confidence level in unit of standard deviations. 
        E.g. 1 stands for 68.3% and 2 stands for 95.4%.

    Returns
    -------
    width, height, rotation :
         The lengths of two axises and the rotation angle in degree
    for the ellipse.
    """

    if q is not None:
        q = np.asarray(q)
    elif nsig is not None:
        q = 2 * norm.cdf(nsig) - 1
    else:
        raise ValueError('One of `q` and `nsig` should be specified.')
    r2 = chi2.ppf(q, 2)

    val, vec = np.linalg.eigh(cov)
    width, height = 2 * sqrt(val[:, None] * r2)
    rotation = np.degrees(arctan2(*vec[::-1, 0]))

    return width, height, rotation

在乔·金顿的回答中，标准差的含义是错误的。

通常我们使用1，2西格玛来表示68%，95%的置信水平，但他的答案中的2西格玛椭圆并不包含总分布的95%概率。正确的方法是使用卡方分布来匹配椭圆大小，如图所示。

输入数据是什么？它是一个二维点阵列吗？你事先知道有两个集群吗？是的，我知道集群的数量。我还不知道输入数据的格式是什么，我猜是一个nx2数组，其中n是点数。在这种情况下，你应该先对它们进行聚类，然后对每个聚类拟合高斯分布，最后绘制置信区间。看看sklearn.clusterI也提供了夏尔巴人的文档，但我实际上不知道这是什么：）很好，谢谢你的回答。我希望我做对了：假设多元正态分布，我们可以简单地取特征值和特征向量来计算椭圆。不幸的是，matplotlib面片不能用对数轴绘制（或者至少不能正确绘制），因为我需要。。。。为什么生活如此复杂？@JoeKington我们不需要参考卡方概率分布表来找出我们的

nstd

，即它是68%、90%还是95%？@pRedator-如果你把它用作测试，是的。（换句话说，这与p置信水平下的另一个分布不同/相同吗？），尽管如此。@ThePredator-

arctan2

返回全角度（可以是4个象限中的任何一个）

arctan

将输出限制在象限1和象限4（介于-pi/2和pi/2之间）。您可能会注意到，

arctan

只接受一个参数。因此，它无法区分象限1和象限4中的角度和象限2和象限3中的类似角度。这是一个被许多其他编程语言所共享的约定，在很大程度上是因为C就是这样定义的。在他的答案中显示的椭圆不是2西格玛椭圆。这是一个3西格玛椭圆，它包含的点确实和你从3西格玛椭圆中所期望的一样多。我相信差异的产生是因为这个答案描述了一个3西格玛椭圆，而乔的答案描述了一个N西格玛误差椭圆。解释了这两者之间的区别。

import numpy as np
from scipy.stats import norm, chi2

def cov_ellipse(cov, q=None, nsig=None, **kwargs):
    """
    Parameters
    ----------
    cov : (2, 2) array
        Covariance matrix.
    q : float, optional
        Confidence level, should be in (0, 1)
    nsig : int, optional
        Confidence level in unit of standard deviations. 
        E.g. 1 stands for 68.3% and 2 stands for 95.4%.

    Returns
    -------
    width, height, rotation :
         The lengths of two axises and the rotation angle in degree
    for the ellipse.
    """

    if q is not None:
        q = np.asarray(q)
    elif nsig is not None:
        q = 2 * norm.cdf(nsig) - 1
    else:
        raise ValueError('One of `q` and `nsig` should be specified.')
    r2 = chi2.ppf(q, 2)

    val, vec = np.linalg.eigh(cov)
    width, height = 2 * sqrt(val[:, None] * r2)
    rotation = np.degrees(arctan2(*vec[::-1, 0]))

    return width, height, rotation