Python-Pandas-基于行平均值筛选出列_Python_Pandas_Dataframe

Python-Pandas-基于行平均值筛选出列

python pandas dataframe

Python-Pandas-基于行平均值筛选出列,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个包含多个列和日期索引的数据框： TIME A B C D E --------------------------------------------------------------------- 2015-03-01 0.74 -0.70 2.62 2.64 3.43 2015-03-02 0

我有一个包含多个列和日期索引的数据框：

TIME           A         B          C              D              E 
---------------------------------------------------------------------    
2015-03-01   0.74      -0.70       2.62           2.64           3.43   
2015-03-02   0.15      -1.28       0.56           400.58         0.08   
2015-03-03  -0.18      -3.82       0.21           0.22          -0.32   
2015-03-04  -1.45      -1.26       0.74           0.76          -0.09   
2015-03-05 -13.01     -12.88     -16.46         -16.45         -11.67   
2015-03-06 -47.73     -57.09     -55.45         -55.51         -55.15   
2015-03-07  -2.31      -3.57     -36.24         -39.50           2.87   
2015-03-08   0.64       0.34       1.76           1.75           1.51

我想删除至少有一个条目的列，其中值不在行平均值的100范围内

换句话说，如果日期2015-03-02的所有列的平均值为80.018，我只想保留该特定日期的值介于-19.982和180.018之间的列。所以在这个例子中，我会排除D列，因为它的值超出了这个范围

我也不想遍历数据帧的行，所以我正在寻找一个非常适合Python的解决方案。

我认为需要：

#if necessary create DatetimeIndex
df = df.set_index('TIME')

#get mean per rows
s = df.mean(axis=1)
#create boolean mask by +/- 100 chained by OR (|)
m = (df.gt(s + 100, axis=0) ) | (df.lt(s - 100, axis=0))

#remove column by condition - inverted mask with any for check at least one True
df = df.loc[:, ~m.any()]
print (df)
                A      B      C      E
TIME                                  
2015-03-01   0.74  -0.70   2.62   3.43
2015-03-02   0.15  -1.28   0.56   0.08
2015-03-03  -0.18  -3.82   0.21  -0.32
2015-03-04  -1.45  -1.26   0.74  -0.09
2015-03-05 -13.01 -12.88 -16.46 -11.67
2015-03-06 -47.73 -57.09 -55.45 -55.15
2015-03-07  -2.31  -3.57 -36.24   2.87
2015-03-08   0.64   0.34   1.76   1.51

详细信息：

print (m)
                A      B      C      D      E
TIME                                         
2015-03-01  False  False  False  False  False
2015-03-02  False  False  False   True  False
2015-03-03  False  False  False  False  False
2015-03-04  False  False  False  False  False
2015-03-05  False  False  False  False  False
2015-03-06  False  False  False  False  False
2015-03-07  False  False  False  False  False
2015-03-08  False  False  False  False  False

另一个解决方案：

m = (df.lt(s + 100, axis=0) ) & (df.gt(s - 100, axis=0))

#check all Trues per columns
df = df.loc[:, m.all()]

我想你误解了我的问题。我的例子很好。

print (m)
               A     B     C      D     E
TIME                                     
2015-03-01  True  True  True   True  True
2015-03-02  True  True  True  False  True
2015-03-03  True  True  True   True  True
2015-03-04  True  True  True   True  True
2015-03-05  True  True  True   True  True
2015-03-06  True  True  True   True  True
2015-03-07  True  True  True   True  True
2015-03-08  True  True  True   True  True