Python 如何在更新列值的同时从数据帧中删除连续的重复行？_Python_Pandas_Dataframe

Python 如何在更新列值的同时从数据帧中删除连续的重复行？

python pandas dataframe

Python 如何在更新列值的同时从数据帧中删除连续的重复行？,python,pandas,dataframe,Python,Pandas,Dataframe,我有以下数据结构： |a |b |start_time |end_time 0 |aaba |d |11:26 | 11:27 1 |aba |c |11:27 | 11:32 2 |aba |c |11:32 | 11:34 3 |cab |ab |11:34 | 11:35 4 |aba |c |11:35

我有以下数据结构：

     |a       |b     |start_time  |end_time
0    |aaba    |d     |11:26       | 11:27
1    |aba     |c     |11:27       | 11:32
2    |aba     |c     |11:32       | 11:34
3    |cab     |ab    |11:34       | 11:35
4    |aba     |c     |11:35       | 11:40

我想合并列

和

上重复的连续行，然后我想将新行的

start\u time

和

end\u time

分别更新为两者中的较早者和较晚者

因为条目是连续的，这意味着保持第一个的

开始时间

，第二个的

结束时间

。通常有两个副本一个接一个

因此，在上面的例子中，我想合并行

和

，最后得到：

     |a    |b    |start_time  |end_time
0    |aaba    |d     |11:26       | 11:27
1    |aba     |c     |11:27       | 11:34
2    |cab     |ab    |11:34       | 11:35
3    |aba     |c     |11:35       | 11:40

我尝试使用

loc

，在第一次运行时更新

end\u time

列，在第二次运行时删除重复项，但运行两次

loc

似乎很浪费：

df.loc[（df['a']+df['b']）==（df['a']+df['b']）移位（-1），'end_time']=df['end_time']移位（-1）
df=df.loc[（df['a']+df['b']）！=（df['a']+df['b']）.shift（-1）]

是否有一种方法可以删除重复项并仅通过一次迭代更新

end\u time

值？

在

上执行

groupby

，

和在连续

上执行

as\u index=False

<每组的代码>agg开始时间的最小值和结束时间的最大值

df.groupby(['a','b', df.b.ne(df.b.shift()).cumsum()], as_index=False).agg({'start_time': 'min', 'end_time': 'max'})

Out[1649]:
      a   b start_time end_time
0  aaba   d      11:26    11:27
1   aba   c      11:27    11:34
2   aba   c      11:35    11:40
3   cab  ab      11:34    11:35