Python 熊猫：删除每年不存在的ID行_Python_Pandas

Python 熊猫：删除每年不存在的ID行

python pandas

Python 熊猫：删除每年不存在的ID行,python,pandas,Python,Pandas,我的数据集包含2009年至2019年期间的银行信息，但一些银行在此期间被合并/收购或关闭，因此我想删除2009年至2019年期间不存在的任何银行。例如，ID 32和56在2019年不存在，因此应该删除它们。以下是我的数据的外观： ID Assets Year 32 10 2009 45 5 2009 56 24 2009 78 9 2009 32 11 2010 45 6 2010 56 31 2010

我的数据集包含2009年至2019年期间的银行信息，但一些银行在此期间被合并/收购或关闭，因此我想删除2009年至2019年期间不存在的任何银行。例如，ID 32和56在2019年不存在，因此应该删除它们。以下是我的数据的外观：

ID  Assets  Year
32    10    2009
45    5     2009
56    24    2009
78    9     2009
32    11    2010
45    6     2010
56    31    2010
78    14    2010
...   ...   ...
32    11    2018
45    13    2018
78    14    2018
45    13    2019
78    3     2019

ID  Assets  Year
45    5     2009
78    9     2009
45    6     2010
78    14    2010
...   ...   ...
45    13    2018
78    14    2018
45    13    2019
78    3     2019

由于从2009年到2019年只存在ID 45和78，其他所有内容都应该删除。下面是它的外观：

ID  Assets  Year
32    10    2009
45    5     2009
56    24    2009
78    9     2009
32    11    2010
45    6     2010
56    31    2010
78    14    2010
...   ...   ...
32    11    2018
45    13    2018
78    14    2018
45    13    2019
78    3     2019

ID  Assets  Year
45    5     2009
78    9     2009
45    6     2010
78    14    2010
...   ...   ...
45    13    2018
78    14    2018
45    13    2019
78    3     2019

假设您有一个已关闭的银行ID列表：

closed = [56, 32]

df[~df['ID'].isin(closed)]

反过来，假设您有一个现有银行列表：

opened = [45, 78]

df[df['ID'].isin(opened)]

根据问题澄清进行编辑

数据：

如果您需要获得每年可用的ID列表：

## number of unique years each id needs to have:
year_count = len(df['Year'].unique())

## get number of unique years that each id has:
id_year_count = df[['Year','ID']].groupby(['ID', 'Year']).count().reset_index().groupby('ID').count().reset_index()

## filter to get the list of ids that match the condition:
opened_every_year = id_year_count['ID'][id_year_count['Year']==year_count].tolist()

## pass the list to the df to filter as described before
df = df[df['ID'].isin(opened_every_year)]

检查：

df.sort_值（['ID'，'Year']）

产出：

    Assets  ID  Year
1        5  45  2009
5        6  45  2010
9       13  45  2018
11      13  45  2019
3        9  78  2009
7       14  78  2010
10      14  78  2018
12       3  78  2019

（请注意，数据样本只有4年的数据，因此它选择了所有4年的ID）

您是否有需要删除的银行列表，或者您是否也需要计算出来？很遗憾，我没有银行列表……那么您希望如何筛选它们？仅仅是2019年存在的银行？知道银行仍然存在的条件是什么？如果可能的话，我希望银行存在于2009年至2019年，因此，如果银行在2012年成立，即使它在2019年仍然存在，它也应该被撤销。因此，每年都有id的银行？非常感谢您的帮助。我得到了我想要的结果！很乐意帮忙。请投票并接受答案：）