Python 如果时间戳已关闭,则删除重复项
我有一个数据框,其中包含关于谁在工作、在哪个任务中以及他/她开始工作的时间的“日志”信息:Python 如果时间戳已关闭,则删除重复项,python,pandas,duplicates,Python,Pandas,Duplicates,我有一个数据框,其中包含关于谁在工作、在哪个任务中以及他/她开始工作的时间的“日志”信息: index | Entrance time | Name | Last name | Employee_ID | Task -------------------------------------------------------------------- 0 |2000-01-01 00:00:00 | John | Fischer | 001 | M
index | Entrance time | Name | Last name | Employee_ID | Task
--------------------------------------------------------------------
0 |2000-01-01 00:00:00 | John | Fischer | 001 | Maintenance
1 |2000-01-01 00:04:30 | John | Fischer | 001 | Development
2 |2000-01-01 00:04:30 | Bob | Conrad | 002 | Maintenance
3 |2000-01-01 00:10:00 | Mary | Smith | 003 | Multitasking
4 |2000-01-01 00:09:30 | John | Fischer | 001 | Maintenance
5 |2000-01-01 00:15:30 | John | Fischer | 001 | Maintenance
6 |2000-01-02 00:04:30 | Bob | Conrad | 002 | Maintenance
7 |2000-01-02 00:10:00 | Mary | Smith | 003 | Multitasking
然后,如果我们正在查找的任务与其他任务之间的进入时间差小于10分钟,并且任务和名称相同,我希望消除重复项。因此,生成的数据帧应该是:
index | Entrance time | Name | Last name | Employee_ID | Task
--------------------------------------------------------------------
0 |2000-01-01 00:00:00 | John | Fischer | 001 | Maintenance
1 |2000-01-01 00:04:30 | John | Fischer | 001 | Development
2 |2000-01-01 00:04:30 | Bob | Conrad | 002 | Maintenance
3 |2000-01-01 00:10:00 | Mary | Smith | 003 | Multitasking
5 |2000-01-01 00:15:30 | John | Fischer | 001 | Maintenance
6 |2000-01-02 00:04:30 | Bob | Conrad | 002 | Maintenance
7 |2000-01-02 00:10:00 | Mary | Smith | 003 | Multitasking
我使用了drop_重复项(subset=[“Name”、“Last Name”、“Task”]),但我不知道如何应用时间条件将每一行与其余行进行比较
希望您能帮助我,提前谢谢您计算时差,这可能会对您有所帮助。但是,您还需要根据重复案例应用您的条件
# Make df sequential in ["Name", "Last name", "Task"]
df.sort_values(["Name", "Last name", "Task"], inplace=True)
# Compute time difference
temp = df['Entrance time'] - df['Entrance time'].shift()
# converts the difference in terms of minutes (taking into account absolute values)
df['diff_mins'] = temp.abs() /np.timedelta64(1,'m')
输出:
2 2 2000-01-01 00:04:30 Bob Conrad 2 Maintenance nan
6 6 2000-01-02 00:04:30 Bob Conrad 2 Maintenance 1440
1 1 2000-01-01 00:04:30 John Fischer 1 Development 1440
0 0 2000-01-01 00:00:00 John Fischer 1 Maintenance 4.5
4 4 2000-01-01 00:09:30 John Fischer 1 Maintenance 9.5
5 5 2000-01-01 00:15:30 John Fischer 1 Maintenance 6
3 3 2000-01-01 00:10:00 Mary Smith 3 Multitasking 5.5
7 7 2000-01-02 00:10:00 Mary Smith 3 Multitasking 1440