Python 数据帧过滤多个列和行_Python_Pandas

Python 数据帧过滤多个列和行

python pandas

Python 数据帧过滤多个列和行,python,pandas,Python,Pandas,给定具有以下格式的数据帧： TEST_ID | ATOMIC_NUMBER | COMPOSITION_PERCENT | POSITION 1 | 28 | 49.84 | 0 1 | 22 | 50.01 | 0 1 | 47 | 0.06 | 1 2 | 22 | 49.

给定具有以下格式的数据帧：

TEST_ID | ATOMIC_NUMBER | COMPOSITION_PERCENT | POSITION
1       | 28            | 49.84               | 0
1       | 22            | 50.01               | 0
1       | 47            | 0.06                | 1
2       | 22            | 49.84               | 0
2       | 47            | 50.01               | 1
3       | 28            | 49.84               | 0
3       | 22            | 50.01               | 0
3       | 47            | 0.06                | 0

我只想选择原子序数为22和28的测试，位置为0，不多也不少。所以我想要一个返回的过滤器：

TEST_ID | ATOMIC_NUMBER | COMPOSITION_PERCENT | POSITION
1       | 28            | 49.84               | 0
1       | 22            | 50.01               | 0
1       | 47            | 0.06                | 1

编辑：我正在尝试将此逻辑从SQL转换为python。以下是SQL代码：

select * from compositions 
where compositions.test_id in (

  select a.test_id from (

    select test_id from compositions
    where test_id in (
      select test_id from (
        select * from COMPOSITIONS where position == 0 )
      group by test_id
      having count(test_id) = 2 )
    and atomic_number = 22) a

  join (

    select test_id from compositions
    where test_id in (
      select test_id from (
        select * from COMPOSITIONS where position == 0 )
      group by test_id
      having count(test_id) = 2 )
    and atomic_number = 28) b

  on a.test_id = b.test_id )

您可以创建一个布尔序列来捕获测试ID，然后使用相同的方法对df进行索引

s = df[df['POSITION'] == 0].groupby('TEST_ID').apply(lambda x: ((x['ATOMIC_NUMBER'].count() == 2 ) & (sorted(x['ATOMIC_NUMBER'].values.tolist()) == [22,28])).all())

test_id = s[s].index.tolist()

df[df['TEST_ID'].isin(test_id)]

    TEST_ID ATOMIC_NUMBER   COMPOSITION_PERCENT POSITION
0   1       28              49.84               0
1   1       22              50.01               0
2   1       47              0.06                1

那么47在做什么？你能为你到目前为止尝试过的东西发布一些代码吗？嗨@Vaishali，我认为这个问题比标记的问题更复杂…所以选择47的原因是因为它与有效结果在同一组中？@user3483203，因为它在位置0（47）处有一个额外的元素。谢谢，我要尝试一下。@Brandon，当然。这个解决方案只检查原子序数中的两个值——位置0的一个22和另一个28。如果满足条件，它将返回用于筛选数据帧的测试ID。@Brandon，很好，它工作正常，感谢您接受：）