Python 聚合各个日期的时间间隔
我有一个带有两个时间戳列的数据帧start和end 其目的是汇总在给定日期记录的小时数。所以在我的例子中 我将创建日期范围,并在多个条目上增加小时数Python 聚合各个日期的时间间隔,python,pandas,timestamp,time-series,aggregation,Python,Pandas,Timestamp,Time Series,Aggregation,我有一个带有两个时间戳列的数据帧start和end 其目的是汇总在给定日期记录的小时数。所以在我的例子中 我将创建日期范围,并在多个条目上增加小时数 2014-08-28 -> 7 hrs 2014-08-29 -> 10 hrs + 1 hr 15 min => 11 hrs 15 mins 2014-08-30 -> 24 hrs 2014-08-31 -> 24 hrs 2014-09-01 -> 17 hrs + 4 hrs => 21 hrs
2014-08-28 -> 7 hrs
2014-08-29 -> 10 hrs + 1 hr 15 min => 11 hrs 15 mins
2014-08-30 -> 24 hrs
2014-08-31 -> 24 hrs
2014-09-01 -> 17 hrs + 4 hrs => 21 hrs
我尝试过使用timedelta,但它只在绝对小时内拆分,而不是在每天的基础上拆分
我还尝试分解行(即按天分割行,但我只能使其在日期级别工作,而不能在时间戳级别工作)
非常感谢您的任何建议。希望对您有所帮助。我想你能适应以达到你的目的。思考的方法是在第二天——在口述中存储日期和相应的时间。如果是同一天——就写下差异。否则写时间到第一个午夜,需要时迭代,写时间从最后一个午夜到结束。仅供参考。。。我想2014-09-01的结果可能是21小时
from datetime import datetime, timedelta
from collections import defaultdict
s = [('2014-08-28 17:00:00', '2014-08-29 22:00:00'),
('2014-08-29 10:45:00', '2014-09-01 17:00:00'),
('2014-09-01 15:00:00', '2014-09-01 19:00:00') ]
def aggreate(time):
store = defaultdict(timedelta)
for slice in time:
start = datetime.strptime(slice[0], "%Y-%m-%d %H:%M:%S")
end = datetime.strptime(slice[1], "%Y-%m-%d %H:%M:%S")
start_date = start.date()
end_date = end.date()
if start_date == end_date:
store[start_date] += end - start
else:
midnight = datetime(start.year, start.month, start.day + 1, 0, 0, 0)
part1 = midnight - start
store[start_date] += part1
for i in range(1, (end_date - start_date).days):
next_date = start_date + timedelta(days=i)
store[next_date] += timedelta(hours=24)
last_midnight = datetime(end_date.year, end_date.month, end_date.day, 0, 0, 0)
store[end_date] += end - last_midnight
return store
r = aggreate(s)
for i in r:
print(i, r[i])
2014-08-28 7:00:00
2014-08-29 1 day, 11:15:00
2014-08-30 1 day, 0:00:00
2014-08-31 1 day, 0:00:00
2014-09-01 21:00:00
您可以使用
pd.date\u range
创建所花费的每一天的分钟间隔
,然后您可以计算所花费的分钟数并将其转换为时间增量
start end
0 2014-08-28 17:00:00 2014-08-29 22:00:00
1 2014-08-29 10:45:00 2014-09-01 17:00:00
2 2014-09-01 15:00:00 2014-09-01 19:00:00
#Creating the minute to minute time intervals from start to end date of each line and creating as one series of dates
a = pd.Series(sum(df.apply(lambda x: pd.date_range(x['start'],x['end'],freq='min').tolist(),1).tolist(),[])).dt.date
# Counting the each mintue intervals and converting to time stamps
a.value_counts().apply(lambda x: pd.to_timedelta(x,'m'))
输出:
start end
0 2014-08-28 17:00:00 2014-08-29 22:00:00
1 2014-08-29 10:45:00 2014-09-01 17:00:00
2 2014-09-01 15:00:00 2014-09-01 19:00:00
#Creating the minute to minute time intervals from start to end date of each line and creating as one series of dates
a = pd.Series(sum(df.apply(lambda x: pd.date_range(x['start'],x['end'],freq='min').tolist(),1).tolist(),[])).dt.date
# Counting the each mintue intervals and converting to time stamps
a.value_counts().apply(lambda x: pd.to_timedelta(x,'m'))
2014-08-29 1 days 11:16:00
2014-08-30 1 days 00:00:00
2014-08-31 1 days 00:00:00
2014-09-01 0 days 21:02:00
2014-08-28 0 days 07:00:00
dtype: timedelta64[ns]