Python 熊猫:转置、分组和汇总列
我有一个熊猫数据框,看起来像这样:Python 熊猫:转置、分组和汇总列,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个熊猫数据框,看起来像这样: | Id | Filter 1 | Filter 2 | Filter 3 | |----|----------|----------|----------| | 25 | 0 | 1 | 1 | | 25 | 1 | 0 | 1 | | 25 | 0 | 0 | 1 | | 30 | 1 | 0 | 1
| Id | Filter 1 | Filter 2 | Filter 3 |
|----|----------|----------|----------|
| 25 | 0 | 1 | 1 |
| 25 | 1 | 0 | 1 |
| 25 | 0 | 0 | 1 |
| 30 | 1 | 0 | 1 |
| 31 | 1 | 0 | 1 |
| 31 | 0 | 1 | 0 |
| 31 | 0 | 0 | 1 |
| Id | Name | Summ |
| 25 | Filter 1 | 1 |
| 25 | Filter 2 | 1 |
| 25 | Filter 3 | 3 |
| 30 | Filter 1 | 1 |
| 30 | Filter 2 | 0 |
| 30 | Filter 3 | 1 |
| 31 | Filter 1 | 1 |
| 31 | Filter 2 | 1 |
| 31 | Filter 3 | 2 |
我需要转换这个表,用过滤器的名称添加“Name”列,并汇总过滤器列的值。结果表应如下所示:
| Id | Filter 1 | Filter 2 | Filter 3 |
|----|----------|----------|----------|
| 25 | 0 | 1 | 1 |
| 25 | 1 | 0 | 1 |
| 25 | 0 | 0 | 1 |
| 30 | 1 | 0 | 1 |
| 31 | 1 | 0 | 1 |
| 31 | 0 | 1 | 0 |
| 31 | 0 | 0 | 1 |
| Id | Name | Summ |
| 25 | Filter 1 | 1 |
| 25 | Filter 2 | 1 |
| 25 | Filter 3 | 3 |
| 30 | Filter 1 | 1 |
| 30 | Filter 2 | 0 |
| 30 | Filter 3 | 1 |
| 31 | Filter 1 | 1 |
| 31 | Filter 2 | 1 |
| 31 | Filter 3 | 2 |
到目前为止,我唯一的解决方案是在groupped by Id列上使用apply函数,但这种方法对于我的案例来说太慢了-数据集可以超过40列和50000行,如何使用pandas原生方法来实现这一点?(例如Pivot、Transpose、Groupby)使用:
df_new=df.melt('Id',var_name='Name',value_name='Sum').groupby(['Id','Name']).Sum.sum()\
.reset_index()
print(df_new)
stack
然后groupby
df.set_index('Id').stack().groupby(level=[0,1]).sum().reset_index()
Id level_1 0
0 25 Filter 1 1
1 25 Filter 2 1
2 25 Filter 3 3
3 30 Filter 1 1
4 30 Filter 2 0
5 30 Filter 3 1
6 31 Filter 1 1
7 31 Filter 2 1
8 31 Filter 3 1
短版
df.set_index('Id').sum(level=0).stack()#df.groupby('Id').sum().stack()
使用
filter
和melt
df.filter(like='Filter').groupby(df.Id).sum().T.reset_index().melt(id_vars='index')
index Id value
0 Filter 1 25 1
1 Filter 2 25 1
2 Filter 3 25 3
3 Filter 1 30 1
4 Filter 2 30 0
5 Filter 3 30 1
6 Filter 1 31 1
7 Filter 2 31 1
8 Filter 3 31 2
感谢你的回复,忍者)这也是最快的方法,在我的测试数据框上,它计算50列和50000行的时间不到5秒。这是有效的,但在我的测试集上,它比上一个回复(melt+groupby+sum+reset_index)慢得多,大约需要20秒seconds@w00lf你试过df.groupby('Id').sum().stack()了吗