Python 将x、y坐标放置到箱子中_Python_Python 2.7_Pandas_Numpy

Python 将x、y坐标放置到箱子中

python python-2.7 pandas numpy

Python 将x、y坐标放置到箱子中,python,python-2.7,pandas,numpy,Python,Python 2.7,Pandas,Numpy,我有一个熊猫数据框，其中两列包含x，y坐标，我绘制如下： plt.figure(figsize=(10,5)) plt.scatter(df.x, df.y, s=1, marker = ".") plt.xlim(-1.5, 1.5) plt.ylim(0, 2) plt.xticks(np.arange(-1.5, 1.6, 0.1)) plt.yticks(np.arange(0, 2.1, 0.1)) plt.grid(True) plt.show() 我想每0.1个单位分割x轴和y

我有一个熊猫数据框，其中两列包含x，y坐标，我绘制如下：

plt.figure(figsize=(10,5))
plt.scatter(df.x, df.y, s=1, marker = ".")
plt.xlim(-1.5, 1.5)
plt.ylim(0, 2)
plt.xticks(np.arange(-1.5, 1.6, 0.1))
plt.yticks(np.arange(0, 2.1, 0.1))
plt.grid(True)
plt.show()

我想每0.1个单位分割x轴和y轴，得到600个箱子（30x20）。然后我想知道每个箱子中有多少个点，以及这些点的索引，这样我就可以在我的数据框中查找它们。我基本上想为每个箱子创建600个新的数据帧

这就是我迄今为止所尝试的：

df[(df.x >= -0.1) & (df.x < 0) & (df.y >= 0.7) & (df.y < 0.8)]

df[（df.x>=-0.1）&（df.x<0）&（df.y>=0.7）&（df.y<0.8）]

这将提供正方形（-0.1）中包含的部分数据帧≤ x<0）和（0.7≤ y<0.8）。我想要一种方法来创建600个这样的文件。

这是许多方法中的一种

bins = (df // .1 * .1).round(1).stack().groupby(level=0).apply(tuple)

dict_of_df = {name: group for name, group in df.groupby(bins)}

您可以使用获取计数的数据帧

df.groupby(bins).size().unstack()

你可以把你的单位转换成它们各自的指数0-19和0-29，然后增加一个由零组成的矩阵

import numpy as np

shape = [30,20]
bins = np.zeros(shape, dtype=int)

xmin = np.min(df.x)
xmax = np.max(df.x)
xwidth = xmax - xmin

xind = int(((df.x - xmin) / xwidth) * shape[0])

#ymin
#ymax
#ywidth

#yind

for ind in zip(xind, yind):
    bins[ind] += 1

我会使用

cut

功能创建垃圾箱，然后按它们分组并计数

#create fake data with bounds for x and y
df = pd.DataFrame({'x':np.random.rand(1000) * 3 - 1.5,
                   'y':np.random.rand(1000) * 2})

# bin the data into equally spaced groups
x_cut = pd.cut(df.x, np.linspace(-1.5, 1.5, 31), right=False)
y_cut = pd.cut(df.y, np.linspace(0, 2, 21), right=False)

# group and count
df.groupby([x_cut, y_cut]).count()

输出

                           x    y
x            y                   
[-1.5, -1.4) [0, 0.1)    3.0  3.0
             [0.1, 0.2)  1.0  1.0
             [0.2, 0.3)  3.0  3.0
             [0.3, 0.4)  NaN  NaN
             [0.4, 0.5)  1.0  1.0
             [0.5, 0.6)  3.0  3.0
             [0.6, 0.7)  1.0  1.0
             [0.7, 0.8)  2.0  2.0
             [0.8, 0.9)  2.0  2.0
             [0.9, 1)    1.0  1.0
             [1, 1.1)    2.0  2.0
             [1.1, 1.2)  1.0  1.0
             [1.2, 1.3)  2.0  2.0
             [1.3, 1.4)  3.0  3.0
             [1.4, 1.5)  2.0  2.0
             [1.5, 1.6)  3.0  3.0
             [1.6, 1.7)  3.0  3.0
             [1.7, 1.8)  1.0  1.0
             [1.8, 1.9)  1.0  1.0
             [1.9, 2)    1.0  1.0
[-1.4, -1.3) [0, 0.1)    NaN  NaN
             [0.1, 0.2)  NaN  NaN
             [0.2, 0.3)  2.0  2.0

                             x         y
x_cut     y_cut                         
[-0.1, 0) [0.7, 0.8) -0.043397  0.702029
          [0.7, 0.8) -0.032508  0.799284
          [0.7, 0.8) -0.036608  0.709394
          [0.7, 0.8) -0.025254  0.741085

完全回答你的问题。您可以将类别作为列添加到原始数据框中，然后像这样从那里进行搜索

# add new columns
df['x_cut'] = x_cut
df['y_cut'] = y_cut
print(df.head(15)

            x         y         x_cut       y_cut
0    1.239743  1.348838    [1.2, 1.3)  [1.3, 1.4)
1   -0.539468  0.349576  [-0.6, -0.5)  [0.3, 0.4)
2    0.406346  1.922738    [0.4, 0.5)    [1.9, 2)
3   -0.779597  0.104891  [-0.8, -0.7)  [0.1, 0.2)
4    1.379920  0.317418    [1.3, 1.4)  [0.3, 0.4)
5    0.075020  0.748397      [0, 0.1)  [0.7, 0.8)
6   -1.227913  0.735301  [-1.3, -1.2)  [0.7, 0.8)
7   -0.866753  0.386308  [-0.9, -0.8)  [0.3, 0.4)
8   -1.004893  1.120654    [-1.1, -1)  [1.1, 1.2)
9    0.007665  0.865248      [0, 0.1)  [0.8, 0.9)
10  -1.072368  0.155731    [-1.1, -1)  [0.1, 0.2)
11   0.819917  1.528905    [0.8, 0.9)  [1.5, 1.6)
12   0.628310  1.022167    [0.6, 0.7)    [1, 1.1)
13   1.002999  0.122493      [1, 1.1)  [0.1, 0.2)
14   0.032624  0.426623      [0, 0.1)  [0.4, 0.5)

然后，为了得到您上面描述的组合：

df[（x>=-0.1）&（df.x<0）&（df.y>=0.7）&（df.y<0.8）]

您可以将索引设置为x_cut和y_cut，并进行一些层次索引选择

df = df.set_index(['x_cut', 'y_cut'])
df.loc[[('[-0.1, 0)', '[0.7, 0.8)')]]

输出

                           x    y
x            y                   
[-1.5, -1.4) [0, 0.1)    3.0  3.0
             [0.1, 0.2)  1.0  1.0
             [0.2, 0.3)  3.0  3.0
             [0.3, 0.4)  NaN  NaN
             [0.4, 0.5)  1.0  1.0
             [0.5, 0.6)  3.0  3.0
             [0.6, 0.7)  1.0  1.0
             [0.7, 0.8)  2.0  2.0
             [0.8, 0.9)  2.0  2.0
             [0.9, 1)    1.0  1.0
             [1, 1.1)    2.0  2.0
             [1.1, 1.2)  1.0  1.0
             [1.2, 1.3)  2.0  2.0
             [1.3, 1.4)  3.0  3.0
             [1.4, 1.5)  2.0  2.0
             [1.5, 1.6)  3.0  3.0
             [1.6, 1.7)  3.0  3.0
             [1.7, 1.8)  1.0  1.0
             [1.8, 1.9)  1.0  1.0
             [1.9, 2)    1.0  1.0
[-1.4, -1.3) [0, 0.1)    NaN  NaN
             [0.1, 0.2)  NaN  NaN
             [0.2, 0.3)  2.0  2.0

                             x         y
x_cut     y_cut                         
[-0.1, 0) [0.7, 0.8) -0.043397  0.702029
          [0.7, 0.8) -0.032508  0.799284
          [0.7, 0.8) -0.036608  0.709394
          [0.7, 0.8) -0.025254  0.741085

如果我理解正确，这实际上与matplotlib没有任何关系？您只需要数据结构，而不是不同的绘图？是的，没错。

计数，x，y=np。Historogram2d（df.x，df.y，[xbins，ybins]）

在二维中进行装箱，其中

xbins

和

ybins

是您定义箱子的数组。此外，

np.digitalize

在一维中进行装箱，类似于下面@Ted Petrou使用

pd.cut的解决方案。要从“dict\u of_df”字典调用特定的数据帧，符号是：dict\u of_df['i'，'j']。此数据框是否包含正方形内的点：（i）≤ x=-0.1）&（df.x<0）&（df.y>=0.7）&（df.y<0.8）]，我的数据得到885。但是len（dict_of_df['-0.1'，'0.7']）返回1011。它们应该是相同的。当我使用'{:0.1f}'
时，它会围绕边缘旋转。那不好。很抱歉。我已经更新了帖子。