Python 如何自定义多索引排序?
下面的代码生成名为Python 如何自定义多索引排序?,python,pandas,Python,Pandas,下面的代码生成名为out的pandas表 import pandas as pd import numpy as np df = pd.DataFrame({'Book': ['B1', 'B1', 'B2', 'B3', 'B3', 'B3'], 'Trader': ['T1', 'Z2', 'Z2', 'T1', 'U3', 'T2'], 'Position':[10, 33, -34, 87, 43, 99
out
的pandas表
import pandas as pd
import numpy as np
df = pd.DataFrame({'Book': ['B1', 'B1', 'B2', 'B3', 'B3', 'B3'],
'Trader': ['T1', 'Z2', 'Z2', 'T1', 'U3', 'T2'],
'Position':[10, 33, -34, 87, 43, 99]})
df = df[['Book', 'Trader', 'Position']]
table = pd.pivot_table(df, index=['Book', 'Trader'], values=['Position'], aggfunc=np.sum)
print(table)
tab_tots = table.groupby(level='Book').sum()
tab_tots.index = [tab_tots.index, ['Total'] * len(tab_tots)]
print(tab_tots)
out = pd.concat(
[table, tab_tots]
).sort_index().append(
table.sum().rename(('Grand', 'Total'))
)
表out
看起来像
但我希望它看起来像
请注意,第二个表总是将“总计”放在底部。所以基本上我还是想按字母顺序排序,但我想总是把“总计”放在最后。是否有人可以对我的代码进行调整,以提供所需的输出?Pandas在
pivot\u table
函数中内置了计算边际总计的功能
table = pd.pivot_table(df,
index='Book',
columns='Trader',
values='Position',
aggfunc=np.sum,
margins=True,
margins_name='Total').drop('Total').stack()
table[('Grand', 'Total')] = table.sum()
table.name = 'Position'
table.reset_index()
Book Trader Position
0 B1 T1 10.0
1 B1 Z2 33.0
2 B1 Total 43.0
3 B2 Z2 -34.0
4 B2 Total -34.0
5 B3 T1 87.0
6 B3 T2 99.0
7 B3 U3 43.0
8 B3 Total 229.0
13 Grand Total 238.0
基于多索引排序的解决方案
此解决方案继续从您的out
数据帧开始分析。您可以将Book
和Trader
转换为Pandas分类类型,该类型允许您通过传入参数ordered=True
和类别列表来自定义排序,排序顺序为您想要排序的顺序
out = out.reset_index()
trader_cats = pd.Categorical(out['Trader'],
categories=sorted(df.Trader.unique()) + ['Total'],
ordered=True)
book_cats = pd.Categorical(out['Book'],
categories=sorted(df.Book.unique()) + ['Grand'],
ordered=True)
out['Trader'] = trader_cats
out['Book'] = book_cats
out.set_index(['Book', 'Trader'], inplace=True)
out.sort_index(level=['Book', 'Trader'])
Position
Book Trader
B1 T1 10
Z2 33
Total 43
B2 Z2 -34
Total -34
B3 T1 87
T2 99
U3 43
Total 229
Grand Total 238
您可以使用来重塑形状。然后轻松添加新的Total
列,计算Grand Total
和。最后添加新行的方式:
与其他解决方案相比:
def jez(df):
df1 = df.groupby(['Book','Trader']).Position.sum().unstack()
df1['Total'] = df1.sum(1)
all_sum = df1['Total'].sum()
df1 = df1.stack()
df1.loc[('Grand','Total')] = all_sum
df1 = df1.reset_index(name='Position')
return (df1)
def ted1(df):
table = pd.pivot_table(df,
index=['Book'],
columns=['Trader'],
values=['Position'],
aggfunc=np.sum,
margins=True,
margins_name='total')
return table.stack()\
.rename({'total':'Total'})\
.reset_index(1)\
.rename({'Total':'Grand'})\
.reset_index()\
.query('Book != "Grand" | Trader == "Total"')
print (jez(df))
print (ted1(df))
In [419]: %timeit (jez(df))
100 loops, best of 3: 5.65 ms per loop
In [420]: %timeit (ted1(df))
10 loops, best of 3: 26.5 ms per loop
结论:
对于小计而言,使用“分组比+取消堆叠”解决方案更快,小计的“求和”也更容易
pivot_table
用于数据透视更容易(一个函数),但对于小计+总行的操作更复杂
def jez(df):
df1 = df.groupby(['Book','Trader']).Position.sum().unstack()
df1['Total'] = df1.sum(1)
all_sum = df1['Total'].sum()
df1 = df1.stack()
df1.loc[('Grand','Total')] = all_sum
df1 = df1.reset_index(name='Position')
return (df1)
def ted1(df):
table = pd.pivot_table(df,
index=['Book'],
columns=['Trader'],
values=['Position'],
aggfunc=np.sum,
margins=True,
margins_name='total')
return table.stack()\
.rename({'total':'Total'})\
.reset_index(1)\
.rename({'Total':'Grand'})\
.reset_index()\
.query('Book != "Grand" | Trader == "Total"')
print (jez(df))
print (ted1(df))
In [419]: %timeit (jez(df))
100 loops, best of 3: 5.65 ms per loop
In [420]: %timeit (ted1(df))
10 loops, best of 3: 26.5 ms per loop