Python 如何获取直方图箱中的数据_Python_Numpy_Matplotlib_Histogram

Python 如何获取直方图箱中的数据

python numpy matplotlib

Python 如何获取直方图箱中的数据,python,numpy,matplotlib,histogram,Python,Numpy,Matplotlib,Histogram,我想得到一个包含在直方图箱中的数据列表。我正在使用numpy和Matplotlib。我知道如何遍历数据并检查箱子边缘。但是，我想对2D直方图执行此操作，执行此操作的代码相当难看。numpy是否有任何构造使其更容易实现对于1D情况，我可以使用searchsorted（）。但是逻辑并没有那么好，我也不想在不需要的时候对每个数据点进行二进制搜索大多数令人讨厌的逻辑都是由箱子边界区域造成的。所有区域都有如下边界：[左边缘，右边缘]。除了最后一个箱子，它有如下区域：[左边缘，右边缘] 以下是1D案例的

我想得到一个包含在直方图箱中的数据列表。我正在使用numpy和Matplotlib。我知道如何遍历数据并检查箱子边缘。但是，我想对2D直方图执行此操作，执行此操作的代码相当难看。numpy是否有任何构造使其更容易实现

对于1D情况，我可以使用searchsorted（）。但是逻辑并没有那么好，我也不想在不需要的时候对每个数据点进行二进制搜索

大多数令人讨厌的逻辑都是由箱子边界区域造成的。所有区域都有如下边界：[左边缘，右边缘]。除了最后一个箱子，它有如下区域：[左边缘，右边缘]

以下是1D案例的一些示例代码：

import numpy as np

data = [0, 0.5, 1.5, 1.5, 1.5, 2.5, 2.5, 2.5, 3]

hist, edges = np.histogram(data, bins=3)

print 'data =', data
print 'histogram =', hist
print 'edges =', edges

getbin = 2  #0, 1, or 2

print '---'
print 'alg 1:'

#for i in range(len(data)):
for d in data:
    if d >= edges[getbin]:
        if (getbin == len(edges)-2) or d < edges[getbin+1]:
            print 'found:', d
        #end if
    #end if
#end for

print '---'
print 'alg 2:'

for d in data:
    val = np.searchsorted(edges, d, side='right')-1
    if val == getbin or val == len(edges)-1:
        print 'found:', d
    #end if
#end for

将numpy导入为np
数据=[0,0.5,1.5,1.5,1.5,2.5,2.5,2.5,3]
历史，边缘=np.直方图（数据，箱=3）
打印“数据=”，数据
打印“直方图=”，历史记录
打印“边=”，边
getbin=2#0、1或2
打印“---”
打印“alg 1:”
#对于范围内的i（len（数据））：
对于数据中的d：
如果d>=边[getbin]：
如果（getbin==len（edges）-2）或d


以下是2D案例的一些示例代码：
import numpy as np

xdata = [0, 1.5, 1.5, 2.5, 2.5, 2.5, \
         0.5, 0.5, 0.5, 0.5, 1.5, 1.5, 1.5, 1.5, 1.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, \
         0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 3]
ydata = [0, 5,5, 5, 5, 5, \
         15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, \
         25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 30]

xbins = 3
ybins = 3
hist2d, xedges, yedges = np.histogram2d(xdata, ydata, bins=(xbins, ybins))

print 'data2d =', zip(xdata, ydata)
print 'hist2d ='
print hist2d
print 'xedges =', xedges
print 'yedges =', yedges

getbin2d = 5  #0 through 8

print 'find data in bin #', getbin2d

xedge_i = getbin2d % xbins
yedge_i = int(getbin2d / xbins) #IMPORTANT: this is xbins

for x, y in zip(xdata, ydata):
    # x and y left edges
    if x >= xedges[xedge_i] and y >= yedges[yedge_i]:
        #x right edge
        if xedge_i == xbins-1 or x < xedges[xedge_i + 1]:
            #y right edge
            if yedge_i == ybins-1 or y < yedges[yedge_i + 1]:
                print 'found:', x, y
            #end if
        #end if
    #end if
#end for

将numpy导入为np
扩展数据=[0,1.5,1.5,2.5,2.5,2.5,\
0.5, 0.5, 0.5, 0.5, 1.5, 1.5, 1.5, 1.5, 1.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, \
0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 3]
ydata=[0,5,5,5,5,5\
15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, \
25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 30]
xbins=3
ybins=3
hist2d，xedges，yedges=np.historogram2d（扩展数据，ydata，bin=（xbins，ybins））
打印'data2d='，zip（扩展数据，ydata）
打印“hist2d=”
打印hist2d
打印“xedges=”，xedges
打印“yedges=”，yedges
getbin2d=5#0到8
打印“在bin#中查找数据”，getbin2d
xedge_i=getbin2d%xbins
yedge_i=int（getbin2d/xbins）#重要提示：这是xbins
对于zip中的x、y（扩展数据、ydata）：
#x和y左边缘
如果x>=xedges[xedge_i]和y>=yedges[yedge_i]：
#x右边缘
如果xedge_i==xbins-1或x

有没有一种更干净/更有效的方法可以做到这一点？似乎numpy会有办法做到这一点。
像这样的方法怎么样：
data=numpy.array（[0,0.5,1.5,1.5,1.5,2.5,2.5,2.5,3]）
历史，边缘=numpy.直方图（数据，箱=3）
对于zip中的l，r（边[：-1]，边[1:]）：
打印（数据[（数据>l）和（数据

Out:
[0.5]
[ 1.5  1.5  1.5]
[ 2.5  2.5  2.5]

使用一点代码来处理边缘情况。
来自core NumPy，将为您提供直方图中每个值所属的bin的索引：
import numpy as NP
A = NP.random.randint(0, 10, 100)

bins = NP.array([0., 20., 40., 60., 80., 100.])

# d is an index array holding the bin id for each point in A
d = NP.digitize(A, bins)     

只是出于好奇；为什么在代码中使用诸如#end if之类的注释？“每个像素都很重要”通过这样做，你忽略了缩进的目的。2个原因。首先我是C++开发者，第二个是Python开发者。Python的缺乏支撑使我不感兴趣。当我有很多不同的缩进的复杂代码块时，我不想计算空白。我在Emacs中做了大部分的开发。通过在代码块上添加注释，我可以在每一行上按TAB键，Emacs不会尝试错误地缩进某些内容。这几乎是完美的！如果这里有任何numpy开发者，该函数应该真正包含在直方图文档的“请参阅”部分。digitize（）bin逻辑与直方图（）不完全匹配，这太糟糕了但是bin逻辑。因此，这会导致与我上面的其他示例一样的笨拙代码。这与bins.searchsorted（A，'right'）
不完全相同吗？