Python 基于更复杂的条件删除熊猫中的行

Python 基于更复杂的条件删除熊猫中的行,python,pandas,Python,Pandas,我有以下数据框: time id type 2012-12-19 1 abcF1 2013-11-02 1 xF1yz 2012-12-19 1 abcF1 2012-12-18 1 abcF1 2013-11-02 1 xF1yz 2006-07-07 5 F5spo 2006-07-06 5 F5spo 2005-07-07 5 F5abc time id type <deleted because fo

我有以下数据框:

time        id  type
2012-12-19  1   abcF1
2013-11-02  1   xF1yz
2012-12-19  1   abcF1
2012-12-18  1   abcF1
2013-11-02  1   xF1yz
2006-07-07  5   F5spo
2006-07-06  5   F5spo
2005-07-07  5   F5abc
time        id  type
<deleted because for id 1 the date is not the max value and the type differs from the type of the max date for id 1>
2013-11-02  1   xF1yz
<deleted because for id 1 the date is not the max value and the type differs from the type of the max date for id 1>
<deleted because for id 1 the date is not the max value and the type differs from the type of the max date for id 1>
2013-11-02  1   xF1yz
2006-07-07  5   F5spo
2006-07-06  5   F5spo //kept because although the date is not max, it has the same type as the row with the max date for id 5
<deleted because for id 5 the date is not the max value and the type differs from the type of the max date for id 5>
对于给定的id,我需要找到最大日期

对于那个最长日期,我需要检查类型

如果给定id的类型与最大日期的类型不同,我必须删除该id的每一行

目标数据帧的示例:

time        id  type
2012-12-19  1   abcF1
2013-11-02  1   xF1yz
2012-12-19  1   abcF1
2012-12-18  1   abcF1
2013-11-02  1   xF1yz
2006-07-07  5   F5spo
2006-07-06  5   F5spo
2005-07-07  5   F5abc
time        id  type
<deleted because for id 1 the date is not the max value and the type differs from the type of the max date for id 1>
2013-11-02  1   xF1yz
<deleted because for id 1 the date is not the max value and the type differs from the type of the max date for id 1>
<deleted because for id 1 the date is not the max value and the type differs from the type of the max date for id 1>
2013-11-02  1   xF1yz
2006-07-07  5   F5spo
2006-07-06  5   F5spo //kept because although the date is not max, it has the same type as the row with the max date for id 5
<deleted because for id 5 the date is not the max value and the type differs from the type of the max date for id 5>
时间id类型
2013-11-02 1 xF1yz
2013-11-02 1 xF1yz
2006-07-07 5 F5spo
2006-07-06 5 F5spo//保留,因为尽管日期不是max,但它与id为5的max date行的类型相同
我怎样才能做到这一点? 我刚接触pandas,正在尝试学习使用库的正确方法。

用于获取最大值的索引,只过滤列
id
类型
,以及:

或用于:

用于获取最大值的索引,仅过滤列
id
type
,以及:

或用于:


您可以按时间对数据帧排序,然后按id分组,并选择每组中的最后一行。这是日期最大的行

last_rows = df.sort_values('time').groupby('id').last()
然后将原始数据帧与新数据帧合并:

result = df.merge(last_rows, on=["id", "type"])
#       time_x  id   type      time_y
#0  2013-11-02   1  xF1yz  2013-11-02
#1  2013-11-02   1  xF1yz  2013-11-02
#2  2006-07-07   5  F5spo  2006-07-07
#3  2006-07-06   5  F5spo  2006-07-07
如果需要,删除最后一个重复列:

result.drop('time_y', axis=1, inplace=True)

您可以按时间对数据帧排序,然后按id分组,并选择每组中的最后一行。这是日期最大的行

last_rows = df.sort_values('time').groupby('id').last()
然后将原始数据帧与新数据帧合并:

result = df.merge(last_rows, on=["id", "type"])
#       time_x  id   type      time_y
#0  2013-11-02   1  xF1yz  2013-11-02
#1  2013-11-02   1  xF1yz  2013-11-02
#2  2006-07-07   5  F5spo  2006-07-07
#3  2006-07-06   5  F5spo  2006-07-07
如果需要,删除最后一个重复列:

result.drop('time_y', axis=1, inplace=True)

使用和创建帮助程序
系列
。然后使用:

[外]


使用和创建帮助程序
系列
。然后使用:

[外]

创建一个helper函数,过滤出与max date类型不同的行,然后根据
id

创建一个helper函数,过滤出与max date类型不同的行,然后使用另一种方法将其应用于基于
id

的每个组df

[out]:

df
    time        id  type    time_max    type_max    for_drop
3   2013-11-02  1   xF1yz   True          xF1yz       True
4   2013-11-02  1   xF1yz   False         xF1yz       True
6   2006-07-06  5   F5spo   True          F5spo       True
7   2006-07-07  5   F5spo   False         F5spo       True
另一种方法是使用

[out]:

df
    time        id  type    time_max    type_max    for_drop
3   2013-11-02  1   xF1yz   True          xF1yz       True
4   2013-11-02  1   xF1yz   False         xF1yz       True
6   2006-07-06  5   F5spo   True          F5spo       True
7   2006-07-07  5   F5spo   False         F5spo       True

@DYZ-对于第二种解决方案是的,对于第一种不必要。@DYZ-对于第二种解决方案是的,对于第一种不必要。@jezrael不确定我是否理解?“这怎么会失败?”耶斯雷尔不知道我能理解吗?这是怎么失败的?