Python 带熊猫的部分和和和小计
我试图实现一个表,其中包含如图所示的小计,但该代码不适用于最新的pandas版本(0.18.1),或者该示例对于多个列而不是一个列是错误的。结果如下表所示Python 带熊猫的部分和和和小计,python,pandas,Python,Pandas,我试图实现一个表,其中包含如图所示的小计,但该代码不适用于最新的pandas版本(0.18.1),或者该示例对于多个列而不是一个列是错误的。结果如下表所示 2014 2015 2016 project__name person__username activity__name issue__subject
2014 2015 2016
project__name person__username activity__name issue__subject
Influenster employee1 Development 161.0 122.0 104.0
Fix bug 22.0 0.0 0.0
Refactor view 0.0 7.0 0.0
Quality assurance 172.0 158.0 161.0
employee2 Development 119.0 137.0 155.0
Quality assurance 193.0 186.0 205.0
employee3 Development Refactor view 0.0 0.0 1.0
Profit tools employee1 Development 177.0 136.0 216.0
Quality assurance 162.0 122.0 182.0
employee2 Development 154.0 168.0 124.0
Quality assurance 130.0 183.0 192.0
Fix bug 22.0 0.0 0.0
All 1312.0 1219.0 1340.0
我想要的输出是这样的:
2014 2015 2016
project__name person__username activity__name issue__subject
Influenster employee1 Development 161.0 122.0 104.0
Fix bug 22.0 0.0 0.0
Refactor view 0.0 7.0 0.0
Total xxx xxx xxx
Quality assurance 172.0 158.0 161.0
Total xxx xxx xxx
Total xxx xxx xxx
employee2 Development 119.0 137.0 155.0
Total xxx xxx xxx
Quality assurance 193.0 186.0 205.0
Total xxx xxx xxx
Total xxx xxx xxx
employee3 Development Refactor view 0.0 0.0 1.0
Total xxx xxx xxx
Total xxx xxx xxx
Total xxx xxx xxx
Profit tools employee1 Development 177.0 136.0 216.0
Total xxx xxx xxx
Quality assurance 162.0 122.0 182.0
Total xxx xxx xxx
Total xxx xxx xxx
employee2 Development 154.0 168.0 124.0
Total xxx xxx xxx
Quality assurance 130.0 183.0 192.0
Fix bug 22.0 0.0 0.0
Total xxx xxx xxx
Total xxx xxx xxx
Total xxx xxx xxx
All 1312.0 1219.0 1340.0
任何关于如何实现这一点的帮助都将不胜感激。Recursivegroupby
和apply
考虑运行带有堆栈的三级pivot_表,并将它们连接到最终的groupby对象。如前所述,如果您看到在相应的pivot_表列值上使用
.stack()
,则文档确实有效:
# ISSUE_SUBJECT PIVOT
pt1 = pd.pivot_table(data=df, values=['2014', '2015', '2016'],
columns=['issue__subject'], aggfunc=np.sum,
index=['project__name', 'person__username', 'activity__name'],
margins=True, margins_name = 'Total')
pt1 = pt1.stack().reset_index()
# ACTIVITY_NAME PIVOT
pt2 = pd.pivot_table(data=df, values=['2014', '2015', '2016'],
columns=['activity__name'], aggfunc=np.sum,
index=['project__name', 'person__username'],
margins=True, margins_name = 'Total' )
pt2 = pt2.stack().reset_index()
# PERSON_USERNAME PIVOT
pt3 = pd.pivot_table(data=df, values=['2014', '2015', '2016'],
columns=['person__username'],
aggfunc=np.sum, index=['project__name'],
margins=True, margins_name = 'Total')
pt3 = pt3.stack().reset_index()
# CONCATENATE ALL THREE
gdf = pd.concat([pt1,
pt2[(pt2['project__name']=='Total') |
(pt2['activity__name']=='Total')],
pt3[(pt3['project__name']=='Total') |
(pt3['person__username']=='Total')]]).reset_index(drop=True)
# REPLACE NaNS IN COLUMN
gdf = gdf.apply(lambda x: np.where(pd.isnull(x), '', x), axis=1)
# FINAL GROUPBY (A COUNT USED TO RENDER GROUPBY)
gdf = gdf.groupby(['project__name', 'person__username',
'activity__name', 'issue__subject',
'2014', '2015', '2016']).agg(len)
输出
project__name person__username activity__name issue__subject 2014 2015 2016
Influenster Total 667.0 610.0 626.0 1
employee1 Development 161.0 122.0 104.0 1
Fix bug 22.0 0.0 0.0 1
Refactor view 0.0 7.0 0.0 1
Total 183.0 129.0 104.0 1
Quality assurance 172.0 158.0 161.0 1
Total 172.0 158.0 161.0 1
Total 355.0 287.0 265.0 1
employee2 Development 119.0 137.0 155.0 1
Total 119.0 137.0 155.0 1
Quality assurance 193.0 186.0 205.0 1
Total 193.0 186.0 205.0 1
Total 312.0 323.0 360.0 1
employee3 Development Refactor view 0.0 0.0 1.0 1
Total 0.0 0.0 1.0 1
Total 0.0 0.0 1.0 1
Profit tools Total 645.0 609.0 714.0 1
employee1 Development 177.0 136.0 216.0 1
Total 177.0 136.0 216.0 1
Quality assurance 162.0 122.0 182.0 1
Total 162.0 122.0 182.0 1
Total 339.0 258.0 398.0 1
employee2 Development 154.0 168.0 124.0 1
Total 154.0 168.0 124.0 1
Quality assurance 130.0 183.0 192.0 1
Fix bug 22.0 0.0 0.0 1
Total 152.0 183.0 192.0 1
Total 306.0 351.0 316.0 1
Total 1268.0 1212.0 1339.0 1
Fix bug 44.0 0.0 0.0 1
Refactor view 0.0 7.0 1.0 1
Total 1312.0 1219.0 1340.0 1
Development 633.0 570.0 600.0 1
Quality assurance 679.0 649.0 740.0 1
Total 1312.0 1219.0 1340.0 1
Total 1312.0 1219.0 1340.0 1
employee1 694.0 545.0 663.0 1
employee2 618.0 674.0 676.0 1
employee3 0.0 0.0 1.0 1
文档中的代码确实有效,但是,您需要使用
pt.stack()
并在pivot\u表中包含一个列值,在dataframe中包含columns=['issue\uu subject]
,但不会显示所有级别分组的总计,只显示每个列值。谢谢@Parfait。代码有效地工作,我只是在表上调用stack(),认为它改变了对象,而不是返回一个新的。对于rename语句,我得到TypeError:“tuple”对象是不可调用的错误。与熊猫版有关吗?
project__name person__username activity__name issue__subject 2014 2015 2016
Influenster Total 667.0 610.0 626.0 1
employee1 Development 161.0 122.0 104.0 1
Fix bug 22.0 0.0 0.0 1
Refactor view 0.0 7.0 0.0 1
Total 183.0 129.0 104.0 1
Quality assurance 172.0 158.0 161.0 1
Total 172.0 158.0 161.0 1
Total 355.0 287.0 265.0 1
employee2 Development 119.0 137.0 155.0 1
Total 119.0 137.0 155.0 1
Quality assurance 193.0 186.0 205.0 1
Total 193.0 186.0 205.0 1
Total 312.0 323.0 360.0 1
employee3 Development Refactor view 0.0 0.0 1.0 1
Total 0.0 0.0 1.0 1
Total 0.0 0.0 1.0 1
Profit tools Total 645.0 609.0 714.0 1
employee1 Development 177.0 136.0 216.0 1
Total 177.0 136.0 216.0 1
Quality assurance 162.0 122.0 182.0 1
Total 162.0 122.0 182.0 1
Total 339.0 258.0 398.0 1
employee2 Development 154.0 168.0 124.0 1
Total 154.0 168.0 124.0 1
Quality assurance 130.0 183.0 192.0 1
Fix bug 22.0 0.0 0.0 1
Total 152.0 183.0 192.0 1
Total 306.0 351.0 316.0 1
Total 1268.0 1212.0 1339.0 1
Fix bug 44.0 0.0 0.0 1
Refactor view 0.0 7.0 1.0 1
Total 1312.0 1219.0 1340.0 1
Development 633.0 570.0 600.0 1
Quality assurance 679.0 649.0 740.0 1
Total 1312.0 1219.0 1340.0 1
Total 1312.0 1219.0 1340.0 1
employee1 694.0 545.0 663.0 1
employee2 618.0 674.0 676.0 1
employee3 0.0 0.0 1.0 1