Python 带有'scipy.stats.binned_统计的binned值的标准偏差`_Python_Statistics_Binning

Python 带有'scipy.stats.binned_统计的binned值的标准偏差`

python statistics

Python 带有'scipy.stats.binned_统计的binned值的标准偏差`,python,statistics,binning,Python,Statistics,Binning,当我根据scipy.stats.binned_statistic（）对数据进行分类时，如何获得平均分类值的误差（即标准偏差）例如，如果我按以下方式存储数据： windspeed = 8 * np.random.rand(500) boatspeed = .3 * windspeed**.5 + .2 * np.random.rand(500) bin_means, bin_edges, binnumber = stats.binned_statistic(windspeed,

当我根据

scipy.stats.binned_statistic

（）对数据进行分类时，如何获得平均分类值的误差（即标准偏差）

例如，如果我按以下方式存储数据：

windspeed = 8 * np.random.rand(500)
boatspeed = .3 * windspeed**.5 + .2 * np.random.rand(500)
bin_means, bin_edges, binnumber = stats.binned_statistic(windspeed,
             boatspeed, statistic='median', bins=[1,2,3,4,5,6,7])
plt.figure()
plt.plot(windspeed, boatspeed, 'b.', label='raw data')
plt.hlines(bin_means, bin_edges[:-1], bin_edges[1:], colors='g', lw=5,
        label='binned statistic of data')
plt.legend()

如何获得

bin_平均值的标准偏差？
方法是根据直方图构造概率密度估计值（这只是一个适当规范化直方图的问题），然后计算估计密度的标准偏差或任何其他统计值
适当的归一化是使直方图下的面积为1所需的任何东西。至于计算密度估计的统计数据，从统计数据的定义出发，将其定义为积分（p（x）*f（x），x，-无穷大，+无穷大）
，用密度估计值代替p（x）
，以及f（x）
所需的任何值，例如x
和x^2
，以获得一阶和二阶矩，从中计算方差，然后计算标准偏差
明天我会发布一些公式，或者其他人想在这期间试一试。你也许可以查一些公式，但我的建议是，在查之前一定要先算出答案。
也许我回答得有点晚，但我想知道如何做同样的事情，因此遇到了这个问题。我认为用统计来计算它应该是可能的，但我还没有弄明白。现在，我手动计算它，就像这样（注意，在我的代码中，我使用了固定数量的等距垃圾箱）：
结果图（减去数据点）：

你必须小心垃圾箱。在我正在使用的代码中，其中一个箱子没有点，我必须相应地调整我对stdev的计算 bin_上错误的定义是什么？您可能应该在问题陈述中说明它是什么。@RobertDodier，标准偏差可以。
windspeed = 8 * numpy.random.rand(500)
boatspeed = .3 * windspeed**.5 + .2 * numpy.random.rand(500)
bin_means, bin_edges, binnumber = stats.binned_statistic(windspeed,
         boatspeed, statistic='median', bins=10)

stds = []

# Match each value to the bin number it belongs to
pairs = zip(boatspeed, binnumber)

# Calculate stdev for all elements inside each bin
for n in list(set(binnumber)):  # Iterate over each bin
    in_bin = [x for x, nbin in pairs if nbin == n]  # Get all elements inside bin n
    stds.append(numpy.std(in_bin))

# Calculate the locations of the bins' centers, for plotting
bin_centers = []

for i in range(len(bin_edges) -  1):
    center = bin_edges[i] + (float(bin_edges[i + 1]) - float(bin_edges[i]))/2.
    bin_centers.append(center)

# Plot means
pyplot.figure()
pyplot.hlines(bin_means, bin_edges[:-1], bin_edges[1:], colors='g', lw=5,
    label='binned statistic of data')

# Plot stdev as vertical lines, probably can also be done with errorbar
pyplot.vlines(bin_centers, bin_means - stds, bin_means + stds)

pyplot.legend()
pyplot.show()