Python 是否有方法保留具有特定条件的行,如果不满足此条件,则删除其他行?

Python 是否有方法保留具有特定条件的行,如果不满足此条件,则删除其他行?,python,pandas,dataframe,duplicates,Python,Pandas,Dataframe,Duplicates,我有以下数据帧(df) 问题是: 当结束时间相同时,我想删除持续时间较长的行,删除持续时间最短的行 预期结果: ID start end Diff A 1/8/2020 12:00:05 AM 1/8/2020 12:00:10 AM 5 B 1/9/2020 1:00:06 AM

我有以下数据帧(df)

问题是: 当结束时间相同时,我想删除持续时间较长的行,删除持续时间最短的行

预期结果:

                ID  start                      end                     Diff
                A   1/8/2020 12:00:05 AM       1/8/2020 12:00:10 AM    5
                B   1/9/2020 1:00:06 AM        1/9/2020 1:00:10 AM     4
                B   1/9/2020 1:00:20 AM        1/9/2020 1:00:25 AM     5
                C   1/10/2020 5:00:05 AM       1/10/2020 5:00:25 AM    20
                C   1/10/2020 5:00:40 AM       1/10/2020 5:00:45 AM    5
本质上,当结束时间相同时,我希望删除持续时间较长的行。 我尝试过这种方法,但是,它没有考虑到以下情况: 当结束时间相同时,保留较短的持续时间行

                df.sort_values(['Diff']).drop_duplicates(subset=['ID'])

感谢您的任何建议。

结束
列上使用
groupby
转换为
Diff
的最小值,然后与
df['Diff']
进行比较,并保持返回值为真,检查transform如何在下面的整个组上返回最小值:

df[df['Diff'].eq(df.groupby('end')['Diff'].transform('min'))]


groupby+transform的输出

print(df.groupby('end')['Diff'].transform('min'))

0     5
1     5
2     4
3     4
4     5
5    20
6     5

我们可以用


“开始”
排序,“较短”的持续时间自然是最后一个。然后使用
删除重复项

df.sort_values(['ID', 'start', 'end']).drop_duplicates(['ID', 'end'], keep='last')

  ID               start                 end  Diff
1  A 2020-01-08 00:00:05 2020-01-08 00:00:10     5
3  B 2020-01-09 01:00:06 2020-01-09 01:00:10     4
4  B 2020-01-09 01:00:20 2020-01-09 01:00:25     5
5  C 2020-01-10 05:00:05 2020-01-10 05:00:25    20
6  C 2020-01-10 05:00:40 2020-01-10 05:00:45     5

ID
end
排序,然后选择
Diff
最短的一个

>>> df.sort_values(['ID', 'end', 'Diff']).groupby(['ID', 'end'], sort=False).head(1)

  ID                 start                   end  Diff
1  A  1/8/2020 12:00:05 AM  1/8/2020 12:00:10 AM     5
3  B   1/9/2020 1:00:06 AM   1/9/2020 1:00:10 AM     4
4  B   1/9/2020 1:00:20 AM   1/9/2020 1:00:25 AM     5
5  C  1/10/2020 5:00:05 AM  1/10/2020 5:00:25 AM    20
6  C  1/10/2020 5:00:40 AM  1/10/2020 5:00:45 AM     5

好的,我试试看。你能解释一下代码在做什么吗?我仍然在学习更多关于Pandas@TanishaHudson刚刚更新了解释:)如果你需要什么,请告诉我,这是完美的工作。我得等10分钟才能投票。非常感谢。也很有帮助。非常感谢。
print(df['Diff'].eq(df.groupby('end')['Diff'].transform('min')))

0    False
1     True
2    False
3     True
4     True
5     True
6     True
df[df['Diff'].eq(df['end'].map(df.groupby('end')['Diff'].min()))]

  ID                 start                   end  Diff
1  A  1/8/2020 12:00:05 AM  1/8/2020 12:00:10 AM     5
3  B   1/9/2020 1:00:06 AM   1/9/2020 1:00:10 AM     4
4  B   1/9/2020 1:00:20 AM   1/9/2020 1:00:25 AM     5
5  C  1/10/2020 5:00:05 AM  1/10/2020 5:00:25 AM    20
6  C  1/10/2020 5:00:40 AM  1/10/2020 5:00:45 AM     5
df.sort_values(['ID', 'start', 'end']).drop_duplicates(['ID', 'end'], keep='last')

  ID               start                 end  Diff
1  A 2020-01-08 00:00:05 2020-01-08 00:00:10     5
3  B 2020-01-09 01:00:06 2020-01-09 01:00:10     4
4  B 2020-01-09 01:00:20 2020-01-09 01:00:25     5
5  C 2020-01-10 05:00:05 2020-01-10 05:00:25    20
6  C 2020-01-10 05:00:40 2020-01-10 05:00:45     5
>>> df.sort_values(['ID', 'end', 'Diff']).groupby(['ID', 'end'], sort=False).head(1)

  ID                 start                   end  Diff
1  A  1/8/2020 12:00:05 AM  1/8/2020 12:00:10 AM     5
3  B   1/9/2020 1:00:06 AM   1/9/2020 1:00:10 AM     4
4  B   1/9/2020 1:00:20 AM   1/9/2020 1:00:25 AM     5
5  C  1/10/2020 5:00:05 AM  1/10/2020 5:00:25 AM    20
6  C  1/10/2020 5:00:40 AM  1/10/2020 5:00:45 AM     5