Python 如何在pandas中比较和删除groupby中的行?
我有一个df,看起来像这样:Python 如何在pandas中比较和删除groupby中的行?,python,pandas,dataframe,python-3.7,Python,Pandas,Dataframe,Python 3.7,我有一个df,看起来像这样: datetime policyid score 0 1970-01-01 00:00:01.593560812 9876policyID1234567890 0 1 1970-01-01 00:00:01.593560814 9876policyID1234567890 0 2 1970-
datetime policyid score
0 1970-01-01 00:00:01.593560812 9876policyID1234567890 0
1 1970-01-01 00:00:01.593560814 9876policyID1234567890 0
2 1970-01-01 00:00:01.593560958 9876policyID1234567890 1
3 1970-01-01 00:00:01.593560964 9876policyID1234567890 1
datetime policyid score
1 1970-01-01 00:00:01.593560814 9876policyID1234567890 0
3 1970-01-01 00:00:01.593560964 9876policyID1234567890 1
我想按policyid
和score
进行分组,但只保留每个相同policyid和score具有最大戳记的行
我正在这样做groupby:
df.groupby(['policyid','score'])
此时,我不确定如何比较行之间的时间戳,并使行具有更大的时间戳
新DF应如下所示:
datetime policyid score
0 1970-01-01 00:00:01.593560812 9876policyID1234567890 0
1 1970-01-01 00:00:01.593560814 9876policyID1234567890 0
2 1970-01-01 00:00:01.593560958 9876policyID1234567890 1
3 1970-01-01 00:00:01.593560964 9876policyID1234567890 1
datetime policyid score
1 1970-01-01 00:00:01.593560814 9876policyID1234567890 0
3 1970-01-01 00:00:01.593560964 9876policyID1234567890 1
提前谢谢。您可以使用
对值进行排序
,然后删除重复项
:
df=df.sort_values('datetime').drop_duplicates(['policyid','score'], keep='last')
太好了,我做到了!