Python 在pandas中，如何对DataframeGroupBy对象执行过滤？_Python_Pandas

Python 在pandas中，如何对DataframeGroupBy对象执行过滤？

python pandas

Python 在pandas中，如何对DataframeGroupBy对象执行过滤？,python,pandas,Python,Pandas,假设我有以下csvdata.csv： id,category,price,source_id 1,food,1.00,4 2,drink,1.00,4 3,food,5.00,10 4,food,6.00,10 5,other,2.00,7 6,other,1.00,4 我想按（价格、源代码）对数据进行分组，并使用以下代码进行分组 import pandas as pd df = pd.read_csv('data.csv', names=['id', 'category', 'price

假设我有以下csv

data.csv

：

id,category,price,source_id
1,food,1.00,4
2,drink,1.00,4
3,food,5.00,10
4,food,6.00,10
5,other,2.00,7
6,other,1.00,4

我想按（价格、源代码）对数据进行分组，并使用以下代码进行分组

import pandas as pd


df = pd.read_csv('data.csv', names=['id', 'category', 'price', 'source_id'])
grouped = df.groupby(['price', 'source_id'])
valid_categories = ['food', 'drink']
for price_source, group in grouped:
    if group.category.size < 2:
        continue

    categories = group.category.tolist()
    if 'other' in categories and len(set(categories).intersection(valid_categories)) > 0:
        pass
        """
        Valid data in this case is:

        1,food,1.00,4
        2,drink,1.00,4
        6,other,1.00,4

        I will need all of the above data including the id for other purposes
        """

将熊猫作为pd导入
df=pd.read\u csv（'data.csv'，name=['id'，'category'，'price'，'source\u id']））
grouped=df.groupby（['price'，'source\u id']））
有效的_类别=[“食品”、“饮料”]
对于价格来源，分组为：
如果group.category.size<2：
持续
categories=group.category.tolist（）
如果类别和len中的“其他”（集合（类别）。交叉点（有效类别））>0：
通过
"""
本例中的有效数据为：
1，食物，1.00,4
2、饮料、1.00、4
6，其他，1.00,4
我需要上述所有数据，包括用于其他目的的id
"""

在for循环之前，是否有其他方法在pandas中执行上述过滤？如果可能，是否会比上述方法更快

筛选的标准是：

组的大小大于1
分组数据应包含类别
```
其他
```
和至少一种
```
食品
```
或
```
饮料
```

您可以直接将自定义筛选器应用于GroupBy对象，如

crit = lambda x: all((x.size > 1, 
                      'other' in x.category.values, 
                      set(x.category) & {'food', 'drink'}))

df.groupby(['price', 'source_id']).filter(crit)

输出

  category  id  price  source_id
0     food   1    1.0          4
1    drink   2    1.0          4
5    other   6    1.0          4

您可以直接将自定义筛选器应用于GroupBy对象，如

crit = lambda x: all((x.size > 1, 
                      'other' in x.category.values, 
                      set(x.category) & {'food', 'drink'}))

df.groupby(['price', 'source_id']).filter(crit)

输出

  category  id  price  source_id
0     food   1    1.0          4
1    drink   2    1.0          4
5    other   6    1.0          4