Python 基于更复杂的条件删除熊猫中的行
我有以下数据框:Python 基于更复杂的条件删除熊猫中的行,python,pandas,Python,Pandas,我有以下数据框: time id type 2012-12-19 1 abcF1 2013-11-02 1 xF1yz 2012-12-19 1 abcF1 2012-12-18 1 abcF1 2013-11-02 1 xF1yz 2006-07-07 5 F5spo 2006-07-06 5 F5spo 2005-07-07 5 F5abc time id type <deleted because fo
time id type
2012-12-19 1 abcF1
2013-11-02 1 xF1yz
2012-12-19 1 abcF1
2012-12-18 1 abcF1
2013-11-02 1 xF1yz
2006-07-07 5 F5spo
2006-07-06 5 F5spo
2005-07-07 5 F5abc
time id type
<deleted because for id 1 the date is not the max value and the type differs from the type of the max date for id 1>
2013-11-02 1 xF1yz
<deleted because for id 1 the date is not the max value and the type differs from the type of the max date for id 1>
<deleted because for id 1 the date is not the max value and the type differs from the type of the max date for id 1>
2013-11-02 1 xF1yz
2006-07-07 5 F5spo
2006-07-06 5 F5spo //kept because although the date is not max, it has the same type as the row with the max date for id 5
<deleted because for id 5 the date is not the max value and the type differs from the type of the max date for id 5>
对于给定的id,我需要找到最大日期
对于那个最长日期,我需要检查类型
如果给定id的类型与最大日期的类型不同,我必须删除该id的每一行
目标数据帧的示例:
time id type
2012-12-19 1 abcF1
2013-11-02 1 xF1yz
2012-12-19 1 abcF1
2012-12-18 1 abcF1
2013-11-02 1 xF1yz
2006-07-07 5 F5spo
2006-07-06 5 F5spo
2005-07-07 5 F5abc
time id type
<deleted because for id 1 the date is not the max value and the type differs from the type of the max date for id 1>
2013-11-02 1 xF1yz
<deleted because for id 1 the date is not the max value and the type differs from the type of the max date for id 1>
<deleted because for id 1 the date is not the max value and the type differs from the type of the max date for id 1>
2013-11-02 1 xF1yz
2006-07-07 5 F5spo
2006-07-06 5 F5spo //kept because although the date is not max, it has the same type as the row with the max date for id 5
<deleted because for id 5 the date is not the max value and the type differs from the type of the max date for id 5>
时间id类型
2013-11-02 1 xF1yz
2013-11-02 1 xF1yz
2006-07-07 5 F5spo
2006-07-06 5 F5spo//保留,因为尽管日期不是max,但它与id为5的max date行的类型相同
我怎样才能做到这一点?
我刚接触pandas,正在尝试学习使用库的正确方法。用于获取最大值的索引,只过滤列id
和类型
,以及:
或用于:
用于获取最大值的索引,仅过滤列id
和type
,以及:
或用于:
您可以按时间对数据帧排序,然后按id分组,并选择每组中的最后一行。这是日期最大的行
last_rows = df.sort_values('time').groupby('id').last()
然后将原始数据帧与新数据帧合并:
result = df.merge(last_rows, on=["id", "type"])
# time_x id type time_y
#0 2013-11-02 1 xF1yz 2013-11-02
#1 2013-11-02 1 xF1yz 2013-11-02
#2 2006-07-07 5 F5spo 2006-07-07
#3 2006-07-06 5 F5spo 2006-07-07
如果需要,删除最后一个重复列:
result.drop('time_y', axis=1, inplace=True)
您可以按时间对数据帧排序,然后按id分组,并选择每组中的最后一行。这是日期最大的行
last_rows = df.sort_values('time').groupby('id').last()
然后将原始数据帧与新数据帧合并:
result = df.merge(last_rows, on=["id", "type"])
# time_x id type time_y
#0 2013-11-02 1 xF1yz 2013-11-02
#1 2013-11-02 1 xF1yz 2013-11-02
#2 2006-07-07 5 F5spo 2006-07-07
#3 2006-07-06 5 F5spo 2006-07-07
如果需要,删除最后一个重复列:
result.drop('time_y', axis=1, inplace=True)
使用和创建帮助程序
系列
。然后使用:
[外]
使用和创建帮助程序
系列
。然后使用:
[外]
创建一个helper函数,过滤出与max date类型不同的行,然后根据id
创建一个helper函数,过滤出与max date类型不同的行,然后使用另一种方法将其应用于基于id
的每个组df
[out]:
df
time id type time_max type_max for_drop
3 2013-11-02 1 xF1yz True xF1yz True
4 2013-11-02 1 xF1yz False xF1yz True
6 2006-07-06 5 F5spo True F5spo True
7 2006-07-07 5 F5spo False F5spo True
另一种方法是使用
[out]:
df
time id type time_max type_max for_drop
3 2013-11-02 1 xF1yz True xF1yz True
4 2013-11-02 1 xF1yz False xF1yz True
6 2006-07-06 5 F5spo True F5spo True
7 2006-07-07 5 F5spo False F5spo True
@DYZ-对于第二种解决方案是的,对于第一种不必要。@DYZ-对于第二种解决方案是的,对于第一种不必要。@jezrael不确定我是否理解?“这怎么会失败?”耶斯雷尔不知道我能理解吗?这是怎么失败的?