Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/309.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何自定义多索引排序?_Python_Pandas - Fatal编程技术网

Python 如何自定义多索引排序?

Python 如何自定义多索引排序?,python,pandas,Python,Pandas,下面的代码生成名为out的pandas表 import pandas as pd import numpy as np df = pd.DataFrame({'Book': ['B1', 'B1', 'B2', 'B3', 'B3', 'B3'], 'Trader': ['T1', 'Z2', 'Z2', 'T1', 'U3', 'T2'], 'Position':[10, 33, -34, 87, 43, 99

下面的代码生成名为
out
的pandas表

import pandas as pd 
import numpy as np

df = pd.DataFrame({'Book': ['B1', 'B1', 'B2', 'B3', 'B3', 'B3'], 
                   'Trader': ['T1', 'Z2', 'Z2', 'T1', 'U3', 'T2'], 
                   'Position':[10, 33, -34, 87, 43, 99]})
df = df[['Book', 'Trader', 'Position']]

table = pd.pivot_table(df, index=['Book', 'Trader'], values=['Position'], aggfunc=np.sum)

print(table)

tab_tots = table.groupby(level='Book').sum()
tab_tots.index = [tab_tots.index, ['Total'] * len(tab_tots)]
print(tab_tots)

out = pd.concat(
    [table, tab_tots]
).sort_index().append(
    table.sum().rename(('Grand', 'Total'))
)
out
看起来像

但我希望它看起来像


请注意,第二个表总是将“总计”放在底部。所以基本上我还是想按字母顺序排序,但我想总是把“总计”放在最后。是否有人可以对我的代码进行调整,以提供所需的输出?

Pandas在
pivot\u table
函数中内置了计算边际总计的功能

table = pd.pivot_table(df, 
               index='Book', 
               columns='Trader', 
               values='Position', 
               aggfunc=np.sum, 
               margins=True, 
               margins_name='Total').drop('Total').stack()
table[('Grand', 'Total')] = table.sum()
table.name = 'Position'
table.reset_index()

     Book Trader  Position
0      B1     T1      10.0
1      B1     Z2      33.0
2      B1  Total      43.0
3      B2     Z2     -34.0
4      B2  Total     -34.0
5      B3     T1      87.0
6      B3     T2      99.0
7      B3     U3      43.0
8      B3  Total     229.0
13  Grand  Total     238.0
基于多索引排序的解决方案 此解决方案继续从您的
out
数据帧开始分析。您可以将
Book
Trader
转换为Pandas分类类型,该类型允许您通过传入参数
ordered=True
类别列表来自定义排序,排序顺序为您想要排序的顺序

out = out.reset_index()

trader_cats = pd.Categorical(out['Trader'], 
                   categories=sorted(df.Trader.unique()) + ['Total'], 
                   ordered=True)

book_cats = pd.Categorical(out['Book'], 
                   categories=sorted(df.Book.unique()) + ['Grand'], 
                   ordered=True)

out['Trader'] = trader_cats
out['Book'] = book_cats
out.set_index(['Book', 'Trader'], inplace=True)
out.sort_index(level=['Book', 'Trader'])

              Position
Book  Trader          
B1    T1            10
      Z2            33
      Total         43
B2    Z2           -34
      Total        -34
B3    T1            87
      T2            99
      U3            43
      Total        229
Grand Total        238
您可以使用来重塑形状。然后轻松添加新的
Total
列,计算
Grand Total
和。最后添加新行的方式:

与其他解决方案相比:

def jez(df):
    df1 = df.groupby(['Book','Trader']).Position.sum().unstack()
    df1['Total'] = df1.sum(1)
    all_sum = df1['Total'].sum()
    df1 = df1.stack()
    df1.loc[('Grand','Total')] = all_sum
    df1 = df1.reset_index(name='Position')
    return (df1)


def ted1(df):
    table = pd.pivot_table(df, 
                           index=['Book'], 
                           columns=['Trader'], 
                           values=['Position'], 
                           aggfunc=np.sum, 
                           margins=True, 
                           margins_name='total')
    return table.stack()\
                  .rename({'total':'Total'})\
                  .reset_index(1)\
                  .rename({'Total':'Grand'})\
                  .reset_index()\
                  .query('Book != "Grand" | Trader == "Total"')


print (jez(df))
print (ted1(df))

In [419]: %timeit (jez(df))
100 loops, best of 3: 5.65 ms per loop

In [420]: %timeit (ted1(df))
10 loops, best of 3: 26.5 ms per loop
结论:

对于小计而言,使用“分组比+取消堆叠”解决方案更快,小计的“求和”也更容易

pivot_table
用于数据透视更容易(一个函数),但对于小计+总行的操作更复杂

def jez(df):
    df1 = df.groupby(['Book','Trader']).Position.sum().unstack()
    df1['Total'] = df1.sum(1)
    all_sum = df1['Total'].sum()
    df1 = df1.stack()
    df1.loc[('Grand','Total')] = all_sum
    df1 = df1.reset_index(name='Position')
    return (df1)


def ted1(df):
    table = pd.pivot_table(df, 
                           index=['Book'], 
                           columns=['Trader'], 
                           values=['Position'], 
                           aggfunc=np.sum, 
                           margins=True, 
                           margins_name='total')
    return table.stack()\
                  .rename({'total':'Total'})\
                  .reset_index(1)\
                  .rename({'Total':'Grand'})\
                  .reset_index()\
                  .query('Book != "Grand" | Trader == "Total"')


print (jez(df))
print (ted1(df))

In [419]: %timeit (jez(df))
100 loops, best of 3: 5.65 ms per loop

In [420]: %timeit (ted1(df))
10 loops, best of 3: 26.5 ms per loop