Python 过滤器然后对多级索引数据帧进行排序_Python_Pandas

Python 过滤器然后对多级索引数据帧进行排序

python pandas

Python 过滤器然后对多级索引数据帧进行排序,python,pandas,Python,Pandas,我有一个包含两列（Col1和Col2）的熊猫数据框架和一个多级索引（日期和符号），如下所示： Col1 Col2 Date Symbol 2015-12-01 AAA 0.45 0.53 BBB -1.02 -0.57 CCC -0.41 0.30 2015-11-02 AAA 0.59 -0.42 BBB -2.16 -0.77

我有一个包含两列（Col1和Col2）的熊猫数据框架和一个多级索引（日期和符号），如下所示：

                 Col1    Col2
Date     Symbol     
2015-12-01  AAA  0.45    0.53
            BBB -1.02   -0.57
            CCC -0.41    0.30
2015-11-02  AAA  0.59   -0.42
            BBB -2.16   -0.77
            CCC -1.02    1.09
2015-10-01  AAA -0.44   -0.88
            BBB  0.52    0.27
            CCC -1.76    0.63

复制此数据帧的代码为：

    df = pd.DataFrame({'Date': ['2015-12-01']*3 + ['2015-11-02']*3 + ['2015-10-01']*3,
                    'Symbol': ['AAA','BBB','CCC']*3,
                    'Col1': 0.45,-1.02,-0.41,0.59,-2.16,-1.02,-0.44,0.52,-1.76],
                    'Col2': [0.53,-0.57,0.3,-0.42,-0.77,1.09,-0.88,0.27,0.63]},
                     ).set_index(['Date', 'Symbol'])

在每个日期内，我试图根据Col1中的最大值选择前n行（在本例中为2），然后根据Col2中的值对这些行进行排序（最大值==1，第二大值==2，等等）。将结果作为列添加到原始数据帧中，最终数据帧应如下所示：

                 Col1   Col2    Rank
Date     Symbol         
2015-12-01  AAA  0.45    0.53   1
            CCC -0.41    0.30   2
            BBB -1.02   -0.57   NaN
2015-11-02  CCC -1.02    1.09   1
            AAA  0.59   -0.42   2
            BBB -2.16   -0.77   NaN
2015-10-01  BBB  0.52    0.27   1
            AAA -0.44   -0.88   2
            CCC -1.76    0.63   NaN

 df['largest'] = df.groupby(level='Date').apply(lambda x: x.Col1.nlargest(2)).reset_index(0, drop=True)
 df['ranked'] = df.groupby(level='Date').apply(lambda x: x.dropna(subset=['largest']).Col2.rank(ascending=False)).reset_index(0, drop=True)

我尝试过使用groupby和rank函数，但很难正确地建立索引

例如，

df.reset_index（）
Date         
2015-10-01  7    0.52
            6   -0.44
2015-11-02  3    0.59
            5   -1.02
2015-12-01  0    0.45
            2   -0.41

但我不知道如何排序并将结果放回数据帧。
您可以执行以下操作：
                 Col1   Col2    Rank
Date     Symbol         
2015-12-01  AAA  0.45    0.53   1
            CCC -0.41    0.30   2
            BBB -1.02   -0.57   NaN
2015-11-02  CCC -1.02    1.09   1
            AAA  0.59   -0.42   2
            BBB -2.16   -0.77   NaN
2015-10-01  BBB  0.52    0.27   1
            AAA -0.44   -0.88   2
            CCC -1.76    0.63   NaN

 df['largest'] = df.groupby(level='Date').apply(lambda x: x.Col1.nlargest(2)).reset_index(0, drop=True)
 df['ranked'] = df.groupby(level='Date').apply(lambda x: x.dropna(subset=['largest']).Col2.rank(ascending=False)).reset_index(0, drop=True)

要获得：
                   Col1  Col2  largest  ranked
Date       Symbol                             
2015-12-01 AAA     0.45  0.53     0.45       1
           BBB    -1.02 -0.57      NaN     NaN
           CCC    -0.41  0.30    -0.41       2
2015-11-02 AAA     0.59 -0.42     0.59       2
           BBB    -2.16 -0.77      NaN     NaN
           CCC    -1.02  1.09    -1.02       1
2015-10-01 AAA    -0.44 -0.88    -0.44       2
           BBB     0.52  0.27     0.52       1
           CCC    -1.76  0.63      NaN     NaN

感谢您提供了一个优雅的解决方案，这正是我想要实现的。