Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/277.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 基于统计的熊猫多指标筛选_Python_Pandas - Fatal编程技术网

Python 基于统计的熊猫多指标筛选

Python 基于统计的熊猫多指标筛选,python,pandas,Python,Pandas,我有一个具有以下结构的数据帧res: Field A B Security date EFA 2001-08-17 NaN 29.4944 2001-08-20 0.1983 29.5529 2001-08-21 -0.2374 29.4827 2001-08-22 1.2297 29.8453 2001-08-23 -0.4702 29.7049 2001-08-24 1.3622 30.1096 2001-08-27 -0.1787

我有一个具有以下结构的数据帧
res

    Field   A   B
Security    date        
EFA 
2001-08-17  NaN 29.4944
2001-08-20  0.1983  29.5529
2001-08-21  -0.2374 29.4827
2001-08-22  1.2297  29.8453
2001-08-23  -0.4702 29.7049
2001-08-24  1.3622  30.1096
2001-08-27  -0.1787 30.0558
2001-08-28  -1.1440 29.7119
2001-08-29  -0.4566 29.5763
2001-08-30  -1.4235 29.1553
2001-08-31  0.2407  29.2254
2001-09-04  -2.2809 28.5588
2001-09-05  -0.6143 28.3834
2001-09-06  -2.2662 27.7402
2001-09-07  -0.5902 27.5765
2001-09-10  -1.1450 27.2607
2001-09-17  -4.3758 26.0678
2001-09-18  -0.8075 25.8573
2001-09-19  -0.2714 25.7872
2001-09-20  -4.3537 24.6644
2001-09-21  -2.7975 23.9745
2001-09-24  4.6341  25.0855
2001-09-25  1.1655  25.3778
2001-09-26  0.5069  25.5065
2001-09-27  1.5773  25.9088
2001-09-28  1.9500  26.4140
2001-10-01  -0.5402 26.2713
2001-10-02  0.3530  26.3641
2001-10-03  1.0218  26.6334
2001-10-04  1.0642  26.9169
以及下列指数:

MultiIndex(levels=[[u'EFA', u'IVV', u'SPY'], [2001-01-02 00:00:00, 2001-01-03 00:00:00, 2001-01-04 00:00:00, 2001-01-05 00:00:00, 2001-01-08 00:00:00, 2001-01-09 00:00:00, 2001-01-10 00:00:00, 2001-01-11 00:00:00, 2001-01-12 00:00:00, 2001-01-16 00:00:00, 2001-01-17 00:00:00, 2001-01-18 00:00:00, 2001-01-19 00:00:00, 2001-01-22 00:00:00, 2001-01-23 00:00:00, 2001-01-24 00:00:00, 2001-01-25 00:00:00, 2001-01-26 00:00:00, 2001-01-29 00:00:00, 2001-01-30 00:00:00, 2001-01-31 00:00:00, 2001-02-01 00:00:00, 2001-02-02 00:00:00, 2001-02-05 00:00:00, 2001-02-06 00:00:00, 2001-02-07 00:00:00, 2001-02-08 00:00:00, 2001-02-09 00:00:00, 2001-02-12 00:00:00, 2001-02-13 00:00:00, 2001-02-14 00:00:00, 2001-02-15 00:00:00, 2001-02-16 00:00:00, 2001-02-20 00:00:00, 2001-02-21 00:00:00, 2001-02-22 00:00:00,...]],               names=[u'Security', u'date'])

我想过滤A的平均值我不确定你到底在寻找什么,但下面有两种选择。第一种可能是IMO(使用groupby/transform)最简单的方法,但第二种可能更接近(我认为)您的要求

方法1创建一个与a的平均值相对应的变量,并通过转换符合数据帧的索引:

>>> res['mean_A'] = res.groupby(level=0)['A'].transform('mean')

                          A        B   mean_A
security date                                
efa      2001-08-20  0.1983  29.5529 -0.07536
         2001-08-21 -0.2374  29.4827 -0.07536
         2001-08-22 -1.2297  29.8453 -0.07536
         2001-08-23 -0.4702  29.7049 -0.07536
         2001-08-24  1.3622  30.1096 -0.07536
ivv      2001-08-20  0.1983  29.5529  0.41652
         2001-08-21 -0.2374  29.4827  0.41652
         2001-08-22  1.2297  29.8453  0.41652
         2001-08-23 -0.4702  29.7049  0.41652
         2001-08-24  1.3622  30.1096  0.41652
然后,标准布尔索引非常容易:

>>> res[ res['mean_A'] < 0 ]

                          A        B   mean_A
security date                                
efa      2001-08-20  0.1983  29.5529 -0.07536
         2001-08-21 -0.2374  29.4827 -0.07536
         2001-08-22 -1.2297  29.8453 -0.07536
         2001-08-23 -0.4702  29.7049 -0.07536
         2001-08-24  1.3622  30.1096 -0.07536
>>res[res['mean_A']<0]
平均值
安全日期
全民教育2001-08-20 0.1983 29.5529-0.07536
2001-08-21 -0.2374  29.4827 -0.07536
2001-08-22 -1.2297  29.8453 -0.07536
2001-08-23 -0.4702  29.7049 -0.07536
2001-08-24  1.3622  30.1096 -0.07536
方法2或者,如果你从“f”开始,并且需要这样做,你可以这样做(注意,我使用groupby而不是stack,只是因为这对我来说更自然,但没关系):

>>f=(res.groupby(level=0)['A'].mean()<0)
>>>res[res.reset_index()['security'].map(f).values]
A B
安全日期
全民教育2001-08-20 0.1983 29.5529
2001-08-21 -0.2374  29.4827
2001-08-22 -1.2297  29.8453
2001-08-23 -0.4702  29.7049
2001-08-24  1.3622  30.1096

我不确定您到底在寻找什么,但下面有两种选择。第一种可能是IMO(使用groupby/transform)最简单的方法,但第二种可能更接近(我认为)您的要求

方法1创建一个与a的平均值相对应的变量,并通过转换符合数据帧的索引:

>>> res['mean_A'] = res.groupby(level=0)['A'].transform('mean')

                          A        B   mean_A
security date                                
efa      2001-08-20  0.1983  29.5529 -0.07536
         2001-08-21 -0.2374  29.4827 -0.07536
         2001-08-22 -1.2297  29.8453 -0.07536
         2001-08-23 -0.4702  29.7049 -0.07536
         2001-08-24  1.3622  30.1096 -0.07536
ivv      2001-08-20  0.1983  29.5529  0.41652
         2001-08-21 -0.2374  29.4827  0.41652
         2001-08-22  1.2297  29.8453  0.41652
         2001-08-23 -0.4702  29.7049  0.41652
         2001-08-24  1.3622  30.1096  0.41652
然后,标准布尔索引非常容易:

>>> res[ res['mean_A'] < 0 ]

                          A        B   mean_A
security date                                
efa      2001-08-20  0.1983  29.5529 -0.07536
         2001-08-21 -0.2374  29.4827 -0.07536
         2001-08-22 -1.2297  29.8453 -0.07536
         2001-08-23 -0.4702  29.7049 -0.07536
         2001-08-24  1.3622  30.1096 -0.07536
>>res[res['mean_A']<0]
平均值
安全日期
全民教育2001-08-20 0.1983 29.5529-0.07536
2001-08-21 -0.2374  29.4827 -0.07536
2001-08-22 -1.2297  29.8453 -0.07536
2001-08-23 -0.4702  29.7049 -0.07536
2001-08-24  1.3622  30.1096 -0.07536
方法2或者,如果你从“f”开始,并且需要这样做,你可以这样做(注意,我使用groupby而不是stack,只是因为这对我来说更自然,但没关系):

>>f=(res.groupby(level=0)['A'].mean()<0)
>>>res[res.reset_index()['security'].map(f).values]
A B
安全日期
全民教育2001-08-20 0.1983 29.5529
2001-08-21 -0.2374  29.4827
2001-08-22 -1.2297  29.8453
2001-08-23 -0.4702  29.7049
2001-08-24  1.3622  30.1096

fyi,将答案稍作更改bit@JohnE谢谢你,约翰。老实说,在访问成员时,我发现多索引相当混乱。为什么要以多索引的方式这样做?有什么神奇的东西我错过了吗?哈哈,我非常同意。我倾向于避免多指标95%的时间,并认为这是一个很好的例子,“平好于嵌套”从禅宗的蟒蛇。我主要与堆栈/取消堆栈一起使用。还有其他优点,但我不能很好地描述它们。在任何情况下,你都不会比一个简单的索引多出一个
reset\u index()
,答案稍微改变了一点bit@JohnE谢谢你,约翰。老实说,在访问成员时,我发现多索引相当混乱。为什么要以多索引的方式这样做?有什么神奇的东西我错过了吗?哈哈,我非常同意。我倾向于避免多指标95%的时间,并认为这是一个很好的例子,“平好于嵌套”从禅宗的蟒蛇。我主要与堆栈/取消堆栈一起使用。还有其他优点,但我不能很好地描述它们。在任何情况下,你都离一个简单的索引不远了
>>> f = (res.groupby(level=0)['A'].mean() < 0)
>>> res[ res.reset_index()['security'].map(f).values ]

                          A        B
security date                       
efa      2001-08-20  0.1983  29.5529
         2001-08-21 -0.2374  29.4827
         2001-08-22 -1.2297  29.8453
         2001-08-23 -0.4702  29.7049
         2001-08-24  1.3622  30.1096