Python 按行和列的总和对数据透视表排序_Python_Pandas_Pivot Table

Python 按行和列的总和对数据透视表排序

python pandas

Python 按行和列的总和对数据透视表排序,python,pandas,pivot-table,Python,Pandas,Pivot Table,我有（例如）这个数据帧： COLUMN1 COLUMN2 VALUE 0 0102 1020 1 1 0102 1220 8 2 0102 1210 2 3 0103 1020 1 4 0103 1210 3 5 0103 1222 8 6 0104 1020 3 7 0104 1120 2 （实际上，它大约有900

我有（例如）这个数据帧：

 COLUMN1 COLUMN2  VALUE
0    0102    1020      1
1    0102    1220      8
2    0102    1210      2
3    0103    1020      1
4    0103    1210      3
5    0103    1222      8
6    0104    1020      3
7    0104    1120      2

（实际上，它大约有9000行长。）

由此，我创建了pivot表，其中索引为COLUMN1，列为COLUMN2，值来自values，由0填充，其中NaN

COLUMN2  1020  1120  1210  1220  1222
COLUMN1                              
0102        1     0     2     8     0
0103        1     0     3     0     8
0104        3     2     0     0     0

我必须按行的总数，然后按列的总数对这个轴进行排序。看起来是这样的：

COLUMN2  1220  1222  1020  1210  1120| (GT)
COLUMN1                              |     HIGHEST
0103        0     8     1     3     0| (12) |
0102        8     0     1     2     0| (11) |
0104        0     0     3     0     2| (5)  V
--------------------------------------
(GT:        8     8     5     5     2)
 HIGHTEST---------------------------->  LOWEST

有办法做到这一点吗？我曾尝试通过将索引和列作为列表导入来创建透视，并按照我希望它们出现的顺序进行排序，但pandas在创建表时似乎会自动对它们进行A-Z排序

示例代码：

import pandas as pd

exampledata=[['0102','1020',1],['0102','1220',8],['0102','1210',2],
             ['0103','1020',1],['0103','1210',3], ['0103','1222',8],
             ['0104','1020',3],['0104','1120',2]]

df = pd.DataFrame(exampledata,columns=['COLUMN1','COLUMN2','VALUE'])
print(df)
pivot = pd.pivot_table(df,
                       index='COLUMN1',
                       columns='COLUMN2',
                       values='VALUE',
                       aggfunc='sum',
                       fill_value=0)
print(pivot)

我要试试这样的

pivot['sum_cols'] = pivot.sum(axis=1)
pivot = pivot.sort_values('sum_cols' , ascending=False)

数据透视表的索引（来自

列1

和

列2

的值）的类型为

String

，而

字符串的排序是从A到Z进行的。也许您应该输入整数
类型的索引，然后将以数字进行排序。考虑到列
和索引
允许使用整数类型
df = df.astype('int')

现在，您的pivot\u表
函数输出一个DataFrame
，您可以使用与任何DataFrame
相同的方式按索引或列进行排序
根据：
要对索引进行排序，应执行以下操作：
pivot = pivot.sort_index(ascending=0)

pivot = pivot.sort_index(axis=1, ascending=0)

要对列进行排序，应执行以下操作：
pivot = pivot.sort_index(ascending=0)

pivot = pivot.sort_index(axis=1, ascending=0)

pivot_table
有一个选项margin
，在这种情况下很方便：
(df.pivot_table(index='COLUMN1', columns='COLUMN2', values='VALUE',
               aggfunc='sum', fill_value=0, margins=True)   # pivot with margins 
   .sort_values('All', ascending=False)  # sort by row sum
   .drop('All', axis=1)                  # drop column `All`
   .sort_values('All', ascending=False, axis=1) # sort by column sum
   .drop('All')    # drop row `All`
)

输出：
COLUMN2  1220  1222  1020  1210  1120
COLUMN1                              
103         0     8     1     3     0
102         8     0     1     2     0
104         0     0     3     0     2

不错的一个，这个解决方案没有求和，而是利用了pivot+1下内置的边距选项，这正是我想要的，它的工作方式很有魅力。谢谢！谢谢，那些axis
参数帮了我很大的忙！