Python 熊猫:在多个列上分组
我正在学习熊猫,有很强的SQL背景,所以我需要重新思考许多习惯和思维框架。虽然我认为我理解Python 熊猫:在多个列上分组,python,pandas,pandas-groupby,Python,Pandas,Pandas Groupby,我正在学习熊猫,有很强的SQL背景,所以我需要重新思考许多习惯和思维框架。虽然我认为我理解groupby()方法,但我就是不知道如何在多个列上应用它 假设我们在数据库中有一个表: +----+--------------+-----------+--------------+-------+ | id | product_name | category | subcategory | price | +----+--------------+-----------+-------------
groupby()
方法,但我就是不知道如何在多个列上应用它
假设我们在数据库中有一个表:
+----+--------------+-----------+--------------+-------+
| id | product_name | category | subcategory | price |
+----+--------------+-----------+--------------+-------+
| 1 | product1 | category1 | subcategory1 | 8.41 |
| 2 | product2 | category1 | subcategory1 | 62.74 |
| 3 | product3 | category1 | subcategory2 | 85.84 |
| 4 | product4 | category2 | subcategory2 | 32.71 |
| 5 | product5 | category2 | subcategory1 | 39.62 |
| 6 | product6 | category2 | subcategory1 | 37.43 |
| 7 | product7 | category3 | subcategory2 | 55.01 |
| 8 | product8 | category3 | subcategory1 | 26.91 |
| 9 | product9 | category3 | subcategory3 | 77.13 |
| 10 | product10 | category3 | subcategory3 | 40.79 |
+---+--------------+-----------+--------------+-------+
在多个列上进行聚合非常容易:
从my_表中按类别、子类别分组选择类别、子类别、平均(价格)作为平均价格
其中返回以下内容:
+-----------+--------------+-----------+
| category | subcategory | avg_price |
+-----------+--------------+-----------+
| category1 | subcategory1 | 35.575 |
| category1 | subcategory2 | 85.84 |
| category2 | subcategory1 | 38.525 |
| category2 | subcategory2 | 32.71 |
| category3 | subcategory1 | 26.91 |
| category3 | subcategory2 | 55.01 |
| category3 | subcategory3 | 58.96 |
+-----------+--------------+-----------+
因此,在我明显错误的理解中,这对熊猫也会有同样的影响:
df['price'].groupby(df[['category','subcategory']]).mean()
它返回ValueError:Grouper for''不是一维的,而:
df['price'].groupby(df['category']).mean()
一切正常
有人能帮我吗?你必须修改你的groupby
语法
df.groupby(['category', 'subcategory'])['price'].mean()
我想你需要做-
df.groupby(['category', 'subcategory'])['price'].mean()
试试:df.groupby(['category','subcategory'])['price'].mean()
谢谢大家,就这样!