Pandas 基于条件选择数据帧的列_Pandas

Pandas 基于条件选择数据帧的列

pandas

Pandas 基于条件选择数据帧的列,pandas,Pandas,我有一个DF，其中包含来自英国选举结果的结果，每个政党有一列。因此DF类似于： In[107]: Results.columns Out[107]: Index(['Press Association ID Number', 'Constituency Name', 'Region', 'Country', 'Constituency ID', 'Constituency Type', 'Election Year', 'Electorate', ' Total n

我有一个DF，其中包含来自英国选举结果的结果，每个政党有一列。因此DF类似于：

In[107]: Results.columns
Out[107]: 
Index(['Press Association ID Number', 'Constituency Name', 'Region', 'Country',
       'Constituency ID', 'Constituency Type', 'Election Year', 'Electorate',
       ' Total number of valid votes counted ', 'Unnamed: 9',
       ...
       'Wessex Reg', 'Whig', 'Wigan', 'Worth', 'WP', 'WRP', 'WVPTFP', 'Yorks',
       'Young', 'Zeb'],
      dtype='object', length=147)

e、 g

包含不同政党投票的列为

Results.ix[：，'Unnamed:9'：]

这些政党中的大多数在任何选区的投票率都很低，因此我想把他们排除在外。是否有一种方法（除了我自己迭代每一行和每一列）只返回满足特定条件的列，例如至少有一个值>1000？理想情况下，我希望能够指定如下内容

    Results.ix[:, 'Unnamed: 9': > 1000]

您可以这样做：

In [94]: df
Out[94]:
          a         b         c         d           e         f         g           h
0 -1.450976 -1.361099 -0.411566  0.955718   99.882051 -1.166773 -0.468792  100.333169
1  0.049437 -0.169827  0.692466 -1.441196    0.446337 -2.134966 -0.407058   -0.251068
2 -0.084493 -2.145212 -0.634506  0.697951  101.279115 -0.442328 -0.470583   99.392245
3 -1.604788 -1.136284 -0.680803 -0.196149    2.224444 -0.117834 -0.299730   -0.098353
4 -0.751079 -0.732554  1.235118 -0.427149   99.899120  1.742388 -1.636730   99.822745
5  0.955484 -0.261814 -0.272451  1.039296    0.778508 -2.591915 -0.116368   -0.122376
6  0.395136 -1.155138 -0.065242 -0.519787  100.446026  1.584397  0.448349   99.831206
7 -0.691550  0.052180  0.827145  1.531527   -0.240848  1.832925 -0.801922   -0.298888
8 -0.673087 -0.791235 -1.475404  2.232781  101.521333 -0.424294  0.088186   99.553973
9  1.648968 -1.129342 -1.373288 -2.683352    0.598885  0.306705 -1.742007   -0.161067

In [95]: df[df.loc[:, 'e':].columns[(df.loc[:, 'e':] > 50).any()]]
Out[95]:
            e           h
0   99.882051  100.333169
1    0.446337   -0.251068
2  101.279115   99.392245
3    2.224444   -0.098353
4   99.899120   99.822745
5    0.778508   -0.122376
6  100.446026   99.831206
7   -0.240848   -0.298888
8  101.521333   99.553973
9    0.598885   -0.161067

说明：

In [96]: (df.loc[:, 'e':] > 50).any()
Out[96]:
e     True
f    False
g    False
h     True
dtype: bool

In [97]: df.loc[:, 'e':].columns
Out[97]: Index(['e', 'f', 'g', 'h'], dtype='object')

In [98]: df.loc[:, 'e':].columns[(df.loc[:, 'e':] > 50).any()]
Out[98]: Index(['e', 'h'], dtype='object')

设置：

In [99]: df = pd.DataFrame(np.random.randn(10, 8), columns=list('abcdefgh'))

In [100]: df.loc[::2, list('eh')] += 100

更新：

从0.20.1开始

In [99]: df = pd.DataFrame(np.random.randn(10, 8), columns=list('abcdefgh'))

In [100]: df.loc[::2, list('eh')] += 100