Python 在数据帧中只保留一定数量的具有特定值的行_Python_Pandas

Python 在数据帧中只保留一定数量的具有特定值的行

python pandas

Python 在数据帧中只保留一定数量的具有特定值的行,python,pandas,Python,Pandas,我的部分数据帧df如下所示。这就是我使用df=df.drop\u duplicates（'months\u to\u duration'）后的情况。但是，现在，对于具有相同months\u-to\u到期日的每一行我想为每个\u到期日保留number\u行许多具有此特定到期日的行 months_to_maturity orig_iss_dt \ 1 6 2015-06-25

我的部分数据帧

df

如下所示。这就是我使用

df=df.drop\u duplicates（'months\u to\u duration'）

后的情况。但是，现在，对于具有相同

months\u-to\u到期日的每一行

我想为每个\u到期日保留

number\u行

许多具有此特定到期日的行

   months_to_maturity                      orig_iss_dt  \
1                   6                    2015-06-25 00:00:00.0   
2                  12                    2015-06-25 00:00:00.0   
3                  18                    2015-06-30 00:00:00.0   
4                  24                    2015-06-15 00:00:00.0   
5                  30                    2015-06-30 00:00:00.0   

             maturity_dt  pay_freq_cd  coupon  closing_price  FACE_VALUE  
1  2015-12-24 00:00:00.0          NaN   0.000      99.960889         100  
2  2016-06-23 00:00:00.0          NaN   0.000      99.741444         100  
3  2017-06-30 00:00:00.0            2   0.625      99.968750         100  
4  2018-06-15 00:00:00.0            2   1.125     100.390625         100  
5  2020-06-30 00:00:00.0            2   1.625      99.984375         100

我使用下面的代码执行此操作，其中

成对（df.iterrows（））

给出数据帧的当前行和下一行。我的问题是，我正在从一个包含600000行的Excel文档中读取数据框，因此想知道是否有更好的方法来实现这一点

每个成熟度的行数=列数和行数/60
计数=0
对于成对排列的（i1，第1行），（i2，第2行）（df.iterrows（））：
如果第1行['months\'u to\'u duration']==第2行['months\'u to\'u duration']，并计算每个\'u duration的


谢谢你
你能不能df.groupby（'months\u to\u duration'）。在删除副本之前申请（len）
？@chrisaycock谢谢。我只是想理解你上面的语句中len的意思。@user131983它是Python内置函数的意思len（df）
返回数据帧中的行数。您也可以执行df.groupby（'months\u to\u maturity'）.size（）
您可以只执行df.groupby（'months\u to\u maturity'）。在删除重复项之前应用（len）
吗？@chrisaycock谢谢。我只是想理解你上面的语句中len的意思。@user131983它是Python内置函数的意思len（df）
返回数据帧中的行数。您也可以执行df.groupby（'months\u to\u duration'）.size（）
number_of_rows_for_each_maturity = number_of_columns_and_rows/60
count = 0
        for (i1, row1), (i2, row2) in pairwise(df.iterrows()):
            if row1['months_to_maturity'] == row2['months_to_maturity'] and count < number_of_rows_for_each_maturity + 1:
                count = count + 1
            if row1['months_to_maturity'] == row2['months_to_maturity'] and count == number_of_rows_for_each_maturity + 1:
                df.drop(df.index[i1])
            if row1['months_to_maturity'] != row2['months_to_maturity']:
                count = 0