Python 熊猫如何从聚合中按类别定位？_Python_Pandas

Python 熊猫如何从聚合中按类别定位？

python pandas

Python 熊猫如何从聚合中按类别定位？,python,pandas,Python,Pandas,我有一些我已经装箱的数据，然后按仓位分组，用.count统计每个仓位中的条目，并查询每个仓位的样本数量 import pandas as pd import numpy as np A = np.random.random(10000) bins = np.arange(0, max(A), 0.03) data_bins = pd.cut(A, bins = bins, precision = 100) df = pd.DataFrame({"A": A,

我有一些我已经装箱的数据，然后按仓位分组，用

.count

统计每个仓位中的条目，并查询每个仓位的样本数量

import pandas as pd
import numpy as np

A = np.random.random(10000)
bins = np.arange(0, max(A), 0.03)

data_bins = pd.cut(A, bins = bins, precision = 100)

df = pd.DataFrame({"A": A,
                   "bin":  data_bins})\
    .sort_values(by = ["bin"])\
    .reset_index(drop = True)\
    .dropna()

print(df.head())

# For example, only take bins with more than 310 entries in each
valid_bins = df.groupby("bin")[["A"]].count().query("A > 310")

print(valid_bins)

所以现在我知道在我的大数据集中使用

valid\u-bins

查找哪个箱子了。现在，如何在原始

df

中仅定位这些箱子？

我认为您需要与原始

数据帧大小相同的系列
，因此可以通过以下方式进行过滤：
或将slowier解决方案用于：
df1 = df[df.groupby("bin")["A"].transform('count') > 310]

df1 = df.groupby("bin").filter(lambda x: x["A"].count() > 310)

print(df1.head())
            A           bin
674  0.080059  (0.06, 0.09]
675  0.074179  (0.06, 0.09]
676  0.062529  (0.06, 0.09]
677  0.087312  (0.06, 0.09]
678  0.070065  (0.06, 0.09]