Python如何合并时间跨度并生成更大的时间跨度

Python如何合并时间跨度并生成更大的时间跨度,python,pandas,datetime,timespan,relative-time-span,Python,Pandas,Datetime,Timespan,Relative Time Span,我有以下数据帧 padel start_time end_time duration 38 Padel 10 08:00:00 09:00:00 60 40 Padel 10 10:00:00 11:30:00 90 42 Padel 10 10:30:00 12:00:00 90 44 Padel 10 11:00:00 12:30:00 90 46 Padel 10 11:30:0

我有以下数据帧

       padel start_time  end_time  duration
38  Padel 10   08:00:00  09:00:00        60
40  Padel 10   10:00:00  11:30:00        90
42  Padel 10   10:30:00  12:00:00        90
44  Padel 10   11:00:00  12:30:00        90
46  Padel 10   11:30:00  13:00:00        90
49  Padel 10   16:00:00  17:30:00        90
51  Padel 10   16:30:00  18:00:00        90
53  Padel 10   17:00:00  18:30:00        90
55  Padel 10   17:30:00  19:00:00        90
57  Padel 10   18:00:00  19:30:00        90
59  Padel 10   18:30:00  20:00:00        90
61  Padel 10   19:00:00  20:30:00        90
63  Padel 10   19:30:00  21:00:00        90
65  Padel 10   20:00:00  21:30:00        90
67  Padel 10   20:30:00  22:00:00        90
我想在两者之间选择最长的时间跨度。我想要的输出应该是这样的

       padel start_time  end_time  duration
38  Padel 10   08:00:00  09:00:00        60
40  Padel 10   10:00:00  13:00:00        180
49  Padel 10   16:00:00  22:00:00        360
我不在乎持续时间。我能做到。但我将如何合并重叠的时间跨度。
谢谢

我想不出一个简单的方法,所以我就用for循环。尚未测试此代码,但类似于:

df = df.sort_values(...)
out_df = pd.DataFrame(columns=df.columns)
next_row = None

for row in df.rows:
    if next_row is None:
        next_row = row
    elif row['start_time'] <= next_row['end_time']:
        next_row['end_time'] = row['end_time']
    else:
        out_df = out_df.append(next_row)
        next_row = None

out_df = out_df.append(next_row)
df=df.sort_值(…)
out_df=pd.DataFrame(columns=df.columns)
下一行=无
对于df.rows中的行:
如果下一行为“无”:
下一行=下一行
elif行[“开始时间”]
  • 如果
    start\u time
    大于上一行的
    结束时间(即重叠),则可以使用
    shift()
    创建组
  • 我们使用
    '24:00:00'
    填充NA
  • ,以便我们为第一个值返回'True',因为一天内任何值都不能超过24小时。这是因为
    NaN
    是第一行中带有
    shift()
    的输出,如果我们不这样做,它将返回
    False
  • 它返回一系列
    True
    False
    (即分别为
    1
    0
    )的
    boolean
    ,因此您只需使用
    cumsum
    获取累积和
  • 这将创建一个
    grp
    对象,我们可以将其包含在
    groupby


  • 带有输入数据帧的完整代码

    df = pd.DataFrame(pd.DataFrame({'padel': {38: 'Padel 10',
      40: 'Padel 10',
      42: 'Padel 10',
      44: 'Padel 10',
      46: 'Padel 10',
      49: 'Padel 10',
      51: 'Padel 10',
      53: 'Padel 10',
      55: 'Padel 10',
      57: 'Padel 10',
      59: 'Padel 10',
      61: 'Padel 10',
      63: 'Padel 10',
      65: 'Padel 10',
      67: 'Padel 10'},
     'start_time': {38: '08:00:00',
      40: '10:00:00',
      42: '10:30:00',
      44: '11:00:00',
      46: '11:30:00',
      49: '16:00:00',
      51: '16:30:00',
      53: '17:00:00',
      55: '17:30:00',
      57: '18:00:00',
      59: '18:30:00',
      61: '19:00:00',
      63: '19:30:00',
      65: '20:00:00',
      67: '20:30:00'},
     'end_time': {38: '09:00:00',
      40: '11:30:00',
      42: '12:00:00',
      44: '12:30:00',
      46: '13:00:00',
      49: '17:30:00',
      51: '18:00:00',
      53: '18:30:00',
      55: '19:00:00',
      57: '19:30:00',
      59: '20:00:00',
      61: '20:30:00',
      63: '21:00:00',
      65: '21:30:00',
      67: '22:00:00'},
     'duration': {38: 60,
      40: 90,
      42: 90,
      44: 90,
      46: 90,
      49: 90,
      51: 90,
      53: 90,
      55: 90,
      57: 90,
      59: 90,
      61: 90,
      63: 90,
      65: 90,
      67: 90}}))
    grp = df['start_time'].gt(df['end_time'].shift().fillna('24:00:00')).cumsum() 
    df = df.groupby([grp, 'padel'], as_index=False).agg({'start_time':'first', 'end_time':'last'})
    df['duration'] = ((pd.to_timedelta(df['end_time']) - \
                       pd.to_timedelta(df['start_time'])).dt.seconds / 60).astype(int)
    df
    

    你必须由你的团队来做吗?有
    padel
    列吗?好问题。如果是这样,将padel添加到排序中(第一个),并将
    和行['padel']==下一行['padel']
    添加到
    elif
    条件中。这不会给出正确的输出“``Out[61]:padel start_time end_time duration 0 padel 10 09:00:00 14:00:00 300 1 padel 10 15:00:00 10:00:00 1140`````@gulbazkhan我的答案与您想要的输出完全匹配。但我的答案是9到14。但应该是8比9。我还尝试将列更改为datetime,然后它给了我
    类型错误:dtype datetime64[ns]无法转换为timedelta64[ns]
    @gulbazkhan我将使用我在回答中包含的输入数据帧运行完整代码,并确定与实际数据可能不同的地方。这对您问题中的样本数据100%正确。好的。谢谢,这很有效。首先需要对数据进行排序。这不是一个好结果。我应该得到10:00,它给出了10:30。其他的都好。与16:30相同,问题已解决。顺便说一句,谢谢。我在运行
    ValueError:columns重叠但没有指定后缀时得到了这个结果:Index(['duration'],dtype='object')
    非常适合我。代码中的哪一行给出了该错误?最后一行。发生。顺便说一句,它解决了我的问题。持续时间不是什么大问题。谢谢
    df = pd.DataFrame(pd.DataFrame({'padel': {38: 'Padel 10',
      40: 'Padel 10',
      42: 'Padel 10',
      44: 'Padel 10',
      46: 'Padel 10',
      49: 'Padel 10',
      51: 'Padel 10',
      53: 'Padel 10',
      55: 'Padel 10',
      57: 'Padel 10',
      59: 'Padel 10',
      61: 'Padel 10',
      63: 'Padel 10',
      65: 'Padel 10',
      67: 'Padel 10'},
     'start_time': {38: '08:00:00',
      40: '10:00:00',
      42: '10:30:00',
      44: '11:00:00',
      46: '11:30:00',
      49: '16:00:00',
      51: '16:30:00',
      53: '17:00:00',
      55: '17:30:00',
      57: '18:00:00',
      59: '18:30:00',
      61: '19:00:00',
      63: '19:30:00',
      65: '20:00:00',
      67: '20:30:00'},
     'end_time': {38: '09:00:00',
      40: '11:30:00',
      42: '12:00:00',
      44: '12:30:00',
      46: '13:00:00',
      49: '17:30:00',
      51: '18:00:00',
      53: '18:30:00',
      55: '19:00:00',
      57: '19:30:00',
      59: '20:00:00',
      61: '20:30:00',
      63: '21:00:00',
      65: '21:30:00',
      67: '22:00:00'},
     'duration': {38: 60,
      40: 90,
      42: 90,
      44: 90,
      46: 90,
      49: 90,
      51: 90,
      53: 90,
      55: 90,
      57: 90,
      59: 90,
      61: 90,
      63: 90,
      65: 90,
      67: 90}}))
    grp = df['start_time'].gt(df['end_time'].shift().fillna('24:00:00')).cumsum() 
    df = df.groupby([grp, 'padel'], as_index=False).agg({'start_time':'first', 'end_time':'last'})
    df['duration'] = ((pd.to_timedelta(df['end_time']) - \
                       pd.to_timedelta(df['start_time'])).dt.seconds / 60).astype(int)
    df
    
    #Coeece the start and end times to datetime
    df['start_time']=pd.to_datetime(df['start_time'])
    df['end_time']=pd.to_datetime(df['end_time'])
    
    g=df.groupby(df.end_time.sub(df.start_time.shift(1)).ne('2h').cumsum()).tail(1).reset_index()#Find last entry in each set of pedal
    
    g=g.assign(start_time=df.groupby(df.end_time.sub(df.start_time.shift(1)).ne('2h').cumsum()).start_time.head(1).reset_index().loc[:,'start_time'])#Set start_time to the start_time in each set of pedal
    
    
    g=g.iloc[:,:-1].join(df.groupby(df.end_time.sub(df.start_time.shift(1)).ne('2h').cumsum()).apply(lambda x: (x['end_time'].max()-(x['start_time'].min())).total_seconds()/60).to_frame('duration').reset_index(drop=True))#Calc the duration
    
    
    
        padel start_time  end_time  duration
    0  Padel 10   08:00:00  09:00:00        60
    1  Padel 10   10:00:00  13:00:00       180
    2  Padel 10   16:00:00  22:00:00       360