Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/apache-kafka/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 具有多级列的透视表_Python_Pandas_Pivot - Fatal编程技术网

Python 具有多级列的透视表

Python 具有多级列的透视表,python,pandas,pivot,Python,Pandas,Pivot,给出下面的代码 import numpy as np import pandas as pd df = pd.DataFrame({ 'clients': pd.Series(['A', 'A', 'A', 'B', 'B']), 'odd1': pd.Series([1, 1, 2, 1, 2]), 'odd2': pd.Series([6, 7, 8, 9, 10])}) grpd = df.groupby(['clients', 'odd1']).agg({

给出下面的代码

import numpy as np
import pandas as pd

df = pd.DataFrame({
    'clients': pd.Series(['A', 'A', 'A', 'B', 'B']),
    'odd1': pd.Series([1, 1, 2, 1, 2]),
    'odd2': pd.Series([6, 7, 8, 9, 10])})

grpd = df.groupby(['clients', 'odd1']).agg({
    'odd2': [np.sum, np.average]
}).reset_index('clients').reset_index('odd1')

>> grpd
   odd1 clients  odd2         
                  sum  average
0     1       A    13      6.5
1     2       A     8      8.0
2     1       B     9      9.0
3     2       B    10     10.0
我想创建一个透视表,如下所示:

       | odd1    | odd1    | ...... | odd1    |
------------------------------------|---------|
clients| average | average | .....  | average |
所需输出为:

clients |   1       2      
--------|------------------
A       |   6.5     8.0    
B       |   9.0     10.0
如果我们有一个非多级的列,这将是可行的:

grpd.pivot(index='clients', columns='odd1', values='odd2')
我不知道多级COL是如何工作的

import numpy as np
import pandas as pd

df = pd.DataFrame({
    'clients': pd.Series(['A', 'A', 'A', 'B', 'B']),
    'odd1': pd.Series([1, 1, 2, 1, 2]),
    'odd2': pd.Series([6, 7, 8, 9, 10])})

aggd = df.groupby(['clients', 'odd1']).agg({
    'odd2': [np.sum, np.average]})

print(aggd.unstack(['odd1']).loc[:, ('odd2','average')])
屈服

odd1       1   2
clients         
A        6.5   8
B        9.0  10

解释:grpd的中间步骤之一是

aggd = df.groupby(['clients', 'odd1']).agg({
    'odd2': [np.sum, np.average]})
看起来是这样的:

In [52]: aggd
Out[52]: 
             odd2        
              sum average
clients odd1             
A       1      13     6.5
        2       8     8.0
B       1       9     9.0
        2      10    10.0
aggd
与所需结果之间的视觉比较

odd1       1   2
clients         
A        6.5   8
B        9.0  10
显示
odd1
索引需要成为列索引。该操作(将索引标签移动到列标签)是由完成的工作。因此,自然会取消堆栈
aggd

In [53]: aggd.unstack(['odd1'])
Out[53]: 
        odd2                
         sum     average    
odd1       1   2       1   2
clients                     
A         13   8     6.5   8
B          9  10     9.0  10
现在很容易看出,我们只想选择平均列。这可以通过
loc
完成:

In [54]: aggd.unstack(['odd1']).loc[:, ('odd2','average')]
Out[54]: 
odd1       1   2
clients         
A        6.5   8
B        9.0  10