在python中将对数正态分布的拟合PDF缩放为HistProgram_Python_Scipy_Statistics

在python中将对数正态分布的拟合PDF缩放为HistProgram

python statistics

在python中将对数正态分布的拟合PDF缩放为HistProgram,python,scipy,statistics,Python,Scipy,Statistics,我有一个对数正态分布的集合a样本，希望对其进行拟合。然后我想将样本的直方图和拟合的PDF绘制成一个图，并且我想使用直方图的原始缩放我的问题：如何直接缩放PDF，使其在直方图中可见代码如下： import numpy as np import scipy.stats # generate log-normal distributed set of samples samples = np.random.lognormal( mean=1., sigma=.4, size=10000 )

我有一个对数正态分布的集合a样本，希望对其进行拟合。然后我想将样本的直方图和拟合的PDF绘制成一个图，并且我想使用直方图的原始缩放

我的问题：如何直接缩放PDF，使其在直方图中可见

代码如下：

import numpy as np
import scipy.stats

# generate log-normal distributed set of samples
samples   = np.random.lognormal( mean=1., sigma=.4, size=10000 )

# make a fit to the samples and generate the resulting PDF
shape, loc, scale = scipy.stats.lognorm.fit( samples, floc=0 )
x_fit       = np.linspace( samples.min(), samples.max(), 100 )
samples_fit = scipy.stats.lognorm.pdf( x_fit, shape, loc=loc, scale=scale )

为了更好地理解我的意思，下面是一个数字：

我的问题是，如果有一个参数可以容易地缩放PDF到直方图（我没有找到一个，但这并不意味着太多……），这样PDF在中间图中是可见的。

< P>你所要的是一个期望直方图的图。< /P> 假设[a，b]是直方图的x区间之一。随机样本大小为n，间隔中的预期样本数为

(cdf(b) - cdf(a))*n

其中，cdf（x）是累积分布函数。要绘制期望的直方图，您需要计算每个箱子的值

下面的脚本显示了绘制预期直方图的一种方法在matplotlib直方图的顶部。它生成以下绘图：

注意：因为PDF是CDF的导数，所以可以将CDF（b）-CDF（a）的近似值写成

其中m是，比如，区间[a，b]的中点。然后，您所问的确切问题的答案是，将PDF乘以样本大小和直方图箱宽度来缩放PDF。脚本中有一些注释掉的代码，显示了如何使用缩放PDF绘制期望的直方图。但由于CDF也可用于对数正态分布，您不妨使用它。

虽然我知道将PDF区域缩放到直方图，但我不知道CDF，非常感谢！

import numpy as np
import scipy.stats
import matplotlib.pyplot as plt


# Generate log-normal distributed set of samples
np.random.seed(1234)
samples = np.random.lognormal(mean=1., sigma=.4, size=10000)

# Make a fit to the samples.
shape, loc, scale = scipy.stats.lognorm.fit(samples, floc=0)

# Create the histogram plot using matplotlib.  The first two values in
# the tuple returned by hist are the number of samples in each bin and
# the values of the histogram's bin edges.  counts has length num_bins,
# and edges has length num_bins + 1.
num_bins = 50
clr = '#FFE090'
counts, edges, patches = plt.hist(samples, bins=num_bins, color=clr, label='Sample histogram')

# Create an array of length num_bins containing the center of each bin.
centers = 0.5*(edges[:-1] + edges[1:])

# Compute the CDF at the edges. Then prob, the array of differences,
# is the probability of a sample being in the corresponding bin.
cdf = scipy.stats.lognorm.cdf(edges, shape, loc=loc, scale=scale)
prob = np.diff(cdf)

plt.plot(centers, samples.size*prob, 'k-', linewidth=2, label='Expected histogram')

# prob can also be approximated using the PDF at the centers multiplied
# by the width of the bin:
# p = scipy.stats.lognorm.pdf(centers, shape, loc=loc, scale=scale)
# prob = p*(edges[1] - edges[0])
# plt.plot(centers, samples.size*prob, 'r')

plt.legend()

plt.show()

cdf(b) - cdf(a) = pdf(m)*(b - a)