Python 如何绘制事件到达时间的概率密度函数(PDF)?

Python 如何绘制事件到达时间的概率密度函数(PDF)?,python,numpy,plot,Python,Numpy,Plot,我有一个数据值数组,如下所示: 0.000000000000000000e+00 3.617000000000000171e+01 1.426779999999999973e+02 2.526699999999999946e+01 4.483190000000000168e+02 7.413999999999999702e+00 1.132390000000000043e+02 8.797000000000000597e+00 1.362599999999999945e+01 2.0808809

我有一个数据值数组,如下所示:

0.000000000000000000e+00
3.617000000000000171e+01
1.426779999999999973e+02
2.526699999999999946e+01
4.483190000000000168e+02
7.413999999999999702e+00
1.132390000000000043e+02
8.797000000000000597e+00
1.362599999999999945e+01
2.080880900000000111e+04
5.580000000000000071e+00
3.947999999999999954e+00
2.615000000000000213e+00
2.458000000000000185e+00
8.204600000000000648e+01
1.641999999999999904e+00
5.108999999999999986e+00
2.388999999999999790e+00
2.105999999999999872e+00
5.783000000000000362e+00
4.309999999999999609e+00
3.685999999999999943e+00
6.339999999999999858e+00
2.198999999999999844e+00
3.568999999999999950e+00
2.883999999999999897e+00
7.307999999999999829e+00
2.515000000000000124e+00
3.810000000000000053e+00
2.829000000000000181e+00
2.593999999999999861e+00
3.963999999999999968e+00
7.258000000000000007e+00
3.543000000000000149e+00
2.874000000000000110e+00
................... and so on. 
from matplotlib import pyplot as plt
plt.plot(Data)
我想画出数据值的概率密度函数。我提到了他。但我不明白这是否正确。 我正在使用python。简单数据绘图代码如下所示:

0.000000000000000000e+00
3.617000000000000171e+01
1.426779999999999973e+02
2.526699999999999946e+01
4.483190000000000168e+02
7.413999999999999702e+00
1.132390000000000043e+02
8.797000000000000597e+00
1.362599999999999945e+01
2.080880900000000111e+04
5.580000000000000071e+00
3.947999999999999954e+00
2.615000000000000213e+00
2.458000000000000185e+00
8.204600000000000648e+01
1.641999999999999904e+00
5.108999999999999986e+00
2.388999999999999790e+00
2.105999999999999872e+00
5.783000000000000362e+00
4.309999999999999609e+00
3.685999999999999943e+00
6.339999999999999858e+00
2.198999999999999844e+00
3.568999999999999950e+00
2.883999999999999897e+00
7.307999999999999829e+00
2.515000000000000124e+00
3.810000000000000053e+00
2.829000000000000181e+00
2.593999999999999861e+00
3.963999999999999968e+00
7.258000000000000007e+00
3.543000000000000149e+00
2.874000000000000110e+00
................... and so on. 
from matplotlib import pyplot as plt
plt.plot(Data)
但现在我想绘制PDF(概率密度函数)。但我没有得到任何python库来这样做

使用

例如:

# a is your data array
hist, bins = np.histogram(a, bins=100, normed=True)
bin_centers = (bins[1:]+bins[:-1])*0.5
plt.plot(bin_centers, hist)

您提供的数据集非常小,可以进行可靠的内核密度估计。因此,我将使用另一个数据集演示该过程(如果我正确理解了您试图做的事情)

import numpy as np
import scipy.stats

# generate data samples
data = scipy.stats.expon.rvs(loc=0, scale=1, size=1000, random_state=123)
然后,只需调用

scipy.stats.gaussian_kde(data,bw_method=bw)
其中,
bw
是估算过程的(可选)参数。对于该数据集,考虑到
bw
的三个值,拟合如下所示

# test values for the bw_method option ('None' is the default value)
bw_values =  [None, 0.1, 0.01]

# generate a list of kde estimators for each bw
kde = [scipy.stats.gaussian_kde(data,bw_method=bw) for bw in bw_values]


# plot (normalized) histogram of the data
import matplotlib.pyplot as plt 
plt.hist(data, 50, normed=1, facecolor='green', alpha=0.5);

# plot density estimates
t_range = np.linspace(-2,8,200)
for i, bw in enumerate(bw_values):
    plt.plot(t_range,kde[i](t_range),lw=2, label='bw = '+str(bw))
plt.xlim(-1,6)
plt.legend(loc='best')


请注意,较大的
bw
值会导致更平滑的pdf估算,但是,在本例中,建议负值的成本是可能的,而这里不是这种情况。

由于您使用的是离散数据,您的pdf将被分类为“箱”。使用double很难创建这些容器,因为很难在它们上声明相等,因此当前的PDF几乎肯定看起来像一条平线(因为它正在计算N个唯一值)。你需要介绍一些比较这些的方法,比如四舍五入等等。我可以把它四舍五入到小数点后2位。那我该怎么策划呢@ScottsTainton四舍五入后,您需要计算每个数字的出现次数,然后将其除以您拥有的数据总量,这将为您提供每个值的概率。打印此值是您的PDF。