Pandas 使用数据透视表对列执行数据帧小计_Pandas_Dataframe

Pandas 使用数据透视表对列执行数据帧小计

pandas dataframe

Pandas 使用数据透视表对列执行数据帧小计,pandas,dataframe,Pandas,Dataframe,使用数据帧中的透视表查找列的小计 df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo", "bar", "bar", "bar", "bar"], "B": ["one", "one", "one", "two", "two", "one", "one", "two", "two"], "C": ["small", "large", "large", "small", "small", "large", "small", "smal

使用数据帧中的透视表查找列的小计

df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo", "bar", "bar", "bar", "bar"], "B": ["one", "one", "one", "two", "two", "one", "one", "two", "two"], "C": ["small", "large", "large", "small", "small", "large", "small", "small", "large"], "D": [1, 2, 2, 3, 3, 4, 5, 6, 7]})

print (df)

pd.pivot_table(df, values=['D'], index=['A'], columns=['C', 'B'], aggfunc={'D': np.sum}, margins=True, fill_value=0, margins_name="Total")


following should be the output:

    D                   
C    large    Total    small    Total
B    one  two          one  two 
A                       
bar    4    7    11      5    6    11
foo    4    0     4      1    6     7
Total  8    7    15      6   12    33

在我看来，最好为多索引的第二级添加新的

Total

值，以便按第一级进行可能的筛选

要获得正确的列顺序，请使用

Total

创建有序列

df['B'] = pd.CategoricalIndex(df['B'], 
                              categories= df['B'].unique().tolist() + ['Total'], 
                              ordered=True)

对于防止3级多索引的聚合更改

['D']

到

：

df1 = pd.pivot_table(df, 
                     values='D', 
                     index=['A'], 
                     columns=['C', 'B'], 
                     aggfunc={'D': np.sum}, 
                     fill_value=0)
print (df1)
C   large     small    
B     one two   one two
A                      
bar     4   7     5   6
foo     4   0     1   6

然后创建新的数据框，小计为

sum

和：

然后一起将

Total

添加到最后的位置，最后添加

sum

行：

df = df1.join(df2).sort_index(axis=1)
df.loc['Total'] = df.sum()
print (df)
C     large           small          
B       one two Total   one two Total
A                                    
bar       4   7    11     5   6    11
foo       4   0     4     1   6     7
Total     8   7    15     6  12    18

在我看来，最好为多索引的第二级添加新的

Total

值，以便按第一级进行可能的筛选

要获得正确的列顺序，请使用

Total

创建有序列

df['B'] = pd.CategoricalIndex(df['B'], 
                              categories= df['B'].unique().tolist() + ['Total'], 
                              ordered=True)

对于防止3级多索引的聚合更改

['D']

到

：

df1 = pd.pivot_table(df, 
                     values='D', 
                     index=['A'], 
                     columns=['C', 'B'], 
                     aggfunc={'D': np.sum}, 
                     fill_value=0)
print (df1)
C   large     small    
B     one two   one two
A                      
bar     4   7     5   6
foo     4   0     1   6

然后创建新的数据框，小计为

sum

和：

然后一起将

Total

添加到最后的位置，最后添加

sum

行：

df = df1.join(df2).sort_index(axis=1)
df.loc['Total'] = df.sum()
print (df)
C     large           small          
B       one two Total   one two Total
A                                    
bar       4   7    11     5   6    11
foo       4   0     4     1   6     7
Total     8   7    15     6  12    18

最后一个

值是否正确？最后一个

值是否正确？