Python 使用不同的时间删除基于相同名称和相同日期的行_Python_Python 3.x_Pandas

Python 使用不同的时间删除基于相同名称和相同日期的行

python python-3.x pandas

Python 使用不同的时间删除基于相同名称和相同日期的行,python,python-3.x,pandas,Python,Python 3.x,Pandas,我有这样的pandas数据帧： name date john 2021-01-19 06:30:29 tom 2021-03-21 19:30:01 tom 2021-03-21 22:02:34 sam 2021-02-14 13:13:21 sam 2021-02-16 10:15:55 kim 2021-04-01 15:10:44 sam 2021-01-23 13:13:21 sam 2021-02-

我有这样的

pandas

数据帧：

name     date
john     2021-01-19 06:30:29
tom      2021-03-21 19:30:01
tom      2021-03-21 22:02:34
sam      2021-02-14 13:13:21
sam      2021-02-16 10:15:55
kim      2021-04-01 15:10:44
sam      2021-01-23 13:13:21
sam      2021-02-16 17:11:12

name     date
john     2021-01-19 06:30:29
tom      2021-03-21 19:30:01
sam      2021-02-14 13:13:21
sam      2021-02-16 10:15:55
kim      2021-04-01 15:10:44
sam      2021-01-23 13:13:21

是否有任何方法可以删除基于相同名称和相同日期（甚至不同时间）的行？没关系，留第一个/最后一个。因此，输出将如下所示：

name     date
john     2021-01-19 06:30:29
tom      2021-03-21 19:30:01
tom      2021-03-21 22:02:34
sam      2021-02-14 13:13:21
sam      2021-02-16 10:15:55
kim      2021-04-01 15:10:44
sam      2021-01-23 13:13:21
sam      2021-02-16 17:11:12

name     date
john     2021-01-19 06:30:29
tom      2021-03-21 19:30:01
sam      2021-02-14 13:13:21
sam      2021-02-16 10:15:55
kim      2021-04-01 15:10:44
sam      2021-01-23 13:13:21

您可以创建一个只包含日期的新列，并使用drop_duplicates（）删除日期、名称相同的行

df['date'] = pd.to_datetime(df['date'])
df['dateOnly'] = df['date'].dt.date
df.drop_duplicates(subset=['name', 'dateOnly'], inplace=True)
df.drop(['dateOnly'], axis=1, inplace=True)

示例数据帧

df = pd.DataFrame(
[
['john'  ,   '2021-01-19 06:30:29'],
['tom'   ,   '2021-03-21 19:30:01'],
['tom'   ,   '2021-03-21 22:02:34'],
['sam'   ,   '2021-02-14 13:13:21'],
['sam'   ,   '2021-02-16 10:15:55'],
], columns=['name', 'date'])

输出：

    name    date
0   john    2021-01-19 06:30:29
1   tom     2021-03-21 19:30:01
3   sam     2021-02-14 13:13:21
4   sam     2021-02-16 10:15:55

   name                date
0  john 2021-01-19 06:30:29
1   kim 2021-04-01 15:10:44
2   sam 2021-01-23 13:13:21
3   sam 2021-02-14 13:13:21
4   sam 2021-02-16 10:15:55
5   tom 2021-03-21 19:30:01

使用

groupby

和

first

（假设

df['date']

已经是日期时间）的较短版本：

输出：

    name    date
0   john    2021-01-19 06:30:29
1   tom     2021-03-21 19:30:01
3   sam     2021-02-14 13:13:21
4   sam     2021-02-16 10:15:55

   name                date
0  john 2021-01-19 06:30:29
1   kim 2021-04-01 15:10:44
2   sam 2021-01-23 13:13:21
3   sam 2021-02-14 13:13:21
4   sam 2021-02-16 10:15:55
5   tom 2021-03-21 19:30:01