Python 将matplotlib直方图除以最大仓位值_Python_Numpy_Matplotlib_Histogram

Python 将matplotlib直方图除以最大仓位值

python numpy matplotlib

Python 将matplotlib直方图除以最大仓位值,python,numpy,matplotlib,histogram,Python,Numpy,Matplotlib,Histogram,我想在同一个图上绘制多个直方图，我需要比较数据的分布。我想用每个直方图除以它的最大值，这样所有分布都有相同的比例。然而，根据matplotlib的直方图函数的工作方式，我还没有找到一种简单的方法来实现这一点这是因为n在 n, bins, patches = ax1.hist(y, bins = 20, histtype = 'step', color = 'k') 是每个箱子中的计数数，但我无法将其重新传递给hist，因为它将重新计算我尝试了范数函数和密度函数，但这些函数使分布的面积正规化

我想在同一个图上绘制多个直方图，我需要比较数据的分布。我想用每个直方图除以它的最大值，这样所有分布都有相同的比例。然而，根据matplotlib的直方图函数的工作方式，我还没有找到一种简单的方法来实现这一点

这是因为n在

n, bins, patches = ax1.hist(y, bins = 20, histtype = 'step', color = 'k')

是每个箱子中的计数数，但我无法将其重新传递给hist，因为它将重新计算

我尝试了范数函数和密度函数，但这些函数使分布的面积正规化，而不是分布的高度。我可以复制n，然后使用bins输出重复bins边缘，但这很乏味。当然hist函数必须允许bins值除以常数

下面是示例代码，演示了该问题

y1 = np.random.randn(100)
y2 = 2*np.random.randn(50)
x1 = np.linspace(1,101,100)
x2 = np.linspace(1,51,50)
gs = plt.GridSpec(1,2, wspace = 0, width_ratios = [3,1])
ax = plt.subplot(gs[0])
ax1 = plt.subplot(gs[1])
ax1.yaxis.set_ticklabels([])   # remove the major ticks

ax.scatter(x1, y1, marker='+',color = 'k')#, c=SNR, cmap=plt.cm.Greys)
ax.scatter(x2, y2, marker='o',color = 'k')#, c=SNR, cmap=plt.cm.Greys)
n1, bins1, patches1 = ax1.hist(y1, bins = 20, histtype = 'step', color = 'k',linewidth = 2, orientation = 'horizontal')
n2, bins2, patched2 = ax1.hist(y2, bins = 20, histtype = 'step', linestyle = 'dashed', color = 'k', orientation = 'horizontal')

您可以将参数

bin

指定为一个值列表。使用

np.arange（）

或

np.linspace（）

生成值

我不知道matplotlib是否默认允许这种规范化，但我自己编写了一个函数来实现

它从plt.hist（如上）获取

和

bin

的输出，然后通过下面的函数传递

def hist_norm_height(n,bins,const):
    ''' Function to normalise bin height by a constant. 
        Needs n and bins from np.histogram or ax.hist.'''

    n = np.repeat(n,2)
    n = float32(n) / const
    new_bins = [bins[0]]
    new_bins.extend(np.repeat(bins[1:],2))
    return n,new_bins[:-1]

要现在绘图（我喜欢step直方图），请将其传递给plt.step

例如

plt.step（新的容器，n）

。这将为您提供一个直方图，高度由一个常数标准化。

为比较设置的方法略有不同。可以调整为阶梯式：

# -*- coding: utf-8 -*-
import matplotlib.pyplot as plt
import numpy as np

y = []
y.append(np.random.normal(2, 2, size=40))
y.append(np.random.normal(3, 1.5, size=40))
y.append(np.random.normal(4,4,size=40))
ls = ['dashed','dotted','solid']

fig, (ax1, ax2, ax3) = plt.subplots(ncols=3)
for l, data in zip(ls, y):
    n, b, p = ax1.hist(data, normed=False,
                       #histtype='step', #step's too much of a pain to get the bins
                       #color='k', linestyle=l,
                       alpha=0.2
                       )
    ax2.hist(data, normed=True,
             #histtype = 'step', color='k', linestyle=l,
             alpha=0.2
             )

    n, b, p = ax3.hist(data, normed=False,
                       #histtype='step', #step's too much of a pain to get the bins
                       #color='k', linestyle=l,
                       alpha=0.2
                       )
    high = float(max([r.get_height() for r in p]))
    for r in p:
        r.set_height(r.get_height()/high)
        ax3.add_patch(r)
    ax3.set_ylim(0,1)

ax1.set_title('hist')
ax2.set_title('area==1')
ax3.set_title('fix height')
plt.show()

两个输出：

这可以通过使用

numpy

获得先验直方图值，然后使用

在我看来，

normed

是一种方法。不幸的是，

normed

标准化了曲线下的区域，而不是高度。是的，但这通常是比较直方图的正确方法。你在寻找不同的统计数据吗？我同意。但这是两种不同的分布，我想比较数据中的分布，当我缩放到高度时，这一点最为明显，因为一个最大bin值为150，另一个最大bin值为30。难道你不能从

中提取

const

？我仍然认为规范化版本更容易比较高和宽分布。。。

import numpy as np
import matplotlib.pyplot as plt

# Define random data and number of bins to use
x = np.random.randn(1000)
bins = 10

plt.figure()
# Obtain the bin values and edges using numpy
hist, bin_edges = np.histogram(x, bins=bins, density=True)
# Plot bars with the proper positioning, height, and width.
plt.bar(
    (bin_edges[1:] + bin_edges[:-1]) * .5, hist / hist.max(),
    width=(bin_edges[1] - bin_edges[0]), color="blue")

plt.show()