Python 如何找到时间戳范围和间隔-想法?
我有一个物联网设备运行了18个月,有很多数据需要分析。设备已在不同时间打开和关闭,我想使用具有以下格式的时间戳计算设备何时打开,并且每个样本以一分钟的间隔采集:Python 如何找到时间戳范围和间隔-想法?,python,time,timestamp,Python,Time,Timestamp,我有一个物联网设备运行了18个月,有很多数据需要分析。设备已在不同时间打开和关闭,我想使用具有以下格式的时间戳计算设备何时打开,并且每个样本以一分钟的间隔采集: 08-01-01 10:00 08-01-01 10:01 08-01-01 10:00 08-01-02 03:10 08-01-02 03:11 理想情况下,我希望生成以下格式的报告: Time session 1 - 08-01-01 10:00 08-01-01 10:02 Session 1 ran for thre
08-01-01 10:00
08-01-01 10:01
08-01-01 10:00
08-01-02 03:10
08-01-02 03:11
理想情况下,我希望生成以下格式的报告:
Time session 1 - 08-01-01 10:00 08-01-01 10:02 Session 1 ran for three minutes
Time session 2 - 08-01-02 02:10 08-01-02 03:11 Session 2 ran for 2 minutes
问题是我有超过150k的时间戳,无法想出一种快速排序数据的方法,目前我正在使用另一个数组,它是从第一个时间戳到最后一个时间戳的完整时间戳。然后将原始时间戳数组与主时间戳进行比较,然后设置一个标记。It工人,但不是很有效率,并试图想出一个更好的方法来分析这些数据
import csv
from datetime import date, datetime, timedelta
with open('HomeOfficeApr.csv', 'rU') as csvfile:
readCSV = csv.reader(csvfile, delimiter=',')
orgtimestamp = []
for row in readCSV:
ts = row[0]
orgtimestamp.append(ts)
for elements in range(len(orgtimestamp)):
orgtimestamp[elements]=orgtimestamp[elements][:-9]
# print(timestamp[elements])
print("First time stamp")
print(orgtimestamp[0])
print("Create time stamp range")
def datetime_range(start, end, delta):
current = start
if not isinstance(delta, timedelta):
delta = timedelta(**delta)
while current < end:
yield current
current += delta
#Timestamps hard coded - need to change to first and last timestamp
start = datetime(2017,04,13, 8, 30)
end = datetime(2018,12,31, 12, 0)
gentimestamp = []
#this unlocks the following interface:
for dt in datetime_range(start, end, {'days':0, 'minutes':1}):
gentimestamp.append(str(dt))
for i in range(len(gentimestamp)):
gentimestamp[i]=gentimestamp[i][:-3]
print("Compare time stamp")
print(len(gentimestamp))
CompareTimeStampArray = [None] * len(gentimestamp)
for i in range(len(CompareTimeStampArray)):
CompareTimeStampArray[i] = "Y"
for i in range(len(orgtimestamp)):
for y in range(len(gentimestamp)):
if (orgtimestamp[i][0:4]) == (gentimestamp[y][0:4]):
#print("Match year")
#print(orgtimestamp[i][0:4])
#print(gentimestamp[y][0:4])
if (orgtimestamp[i][5:7]) == (gentimestamp[y][5:7]):
#print("Match month")
#print(orgtimestamp[i][5:7])
#print(gentimestamp[y][5:7])
if (orgtimestamp[i][8:10]) == (gentimestamp[y][8:10]):
#print("Match day")
#print(orgtimestamp[i][8:10])
#print(gentimestamp[y][8:10])
if (orgtimestamp[i][11:13]) == (gentimestamp[y][11:13]):
#print("Match hour")
#print(orgtimestamp[i][11:13])
#print(gentimestamp[y][11:13])
if (orgtimestamp[i][14:16]) == (gentimestamp[y][14:16]):
print("Match second")
print("Date & time match")
print(orgtimestamp[i])
print(gentimestamp[y])
print[i]
print[y]
print("")
CompareTimeStampArray[i] = "X"
break
print("Finished")
导入csv
从日期时间导入日期、日期时间、时间增量
将open('HomeOfficeApr.csv','rU')作为csvfile:
readCSV=csv.reader(csvfile,分隔符=',')
orgtimestamp=[]
对于readCSV中的行:
ts=行[0]
orgtimestamp.append(ts)
对于范围(len(orgtimestamp))中的元素:
orgtimestamp[elements]=orgtimestamp[elements][:-9]
#打印(时间戳[元素])
打印(“首次盖章”)
打印(时间戳[0])
打印(“创建时间戳范围”)
def datetime_范围(开始、结束、增量):
电流=启动
如果不存在(增量、时间增量):
增量=时间增量(**增量)
当前<结束时:
屈服电流
电流+=增量
#硬编码时间戳-需要更改为第一个和最后一个时间戳
开始=日期时间(2017,04,13,8,30)
结束=日期时间(2018,12,31,12,0)
gentimestamp=[]
#这将解锁以下接口:
对于datetime_范围内的dt(开始、结束,{'days':0,'minutes':1}):
gentimestamp.append(str(dt))
对于范围内的i(len(gentimestamp)):
gentimestamp[i]=gentimestamp[i][:-3]
打印(“比较时间戳”)
印刷品(len(gentimestamp))
CompareTimeStampArray=[None]*len(gentimestamp)
对于范围内的i(len(CompareTimeStampArray)):
CompareTimeStampArray[i]=“Y”
对于范围内的i(len(orgtimestamp)):
对于范围内的y(len(gentimestamp)):
如果(orgtimestamp[i][0:4])==(gentimestamp[y][0:4]):
#打印(“比赛年”)
#打印(时间戳[i][0:4])
#打印(gentimestamp[y][0:4])
如果(orgtimestamp[i][5:7])==(gentimestamp[y][5:7]):
#打印(“比赛月”)
#打印(时间戳[i][5:7])
#打印(gentimestamp[y][5:7])
如果(orgtimestamp[i][8:10])==(gentimestamp[y][8:10]):
#打印(“比赛日”)
#打印(时间戳[i][8:10])
#打印(gentimestamp[y][8:10])
如果(orgtimestamp[i][11:13])==(gentimestamp[y][11:13]):
#打印(“比赛时间”)
#打印(时间戳[i][11:13])
#打印(gentimestamp[y][11:13])
如果(orgtimestamp[i][14:16])==(gentimestamp[y][14:16]):
打印(“匹配秒”)
打印(“日期和时间匹配”)
打印(时间戳[i])
打印(gentimestamp[y])
印刷品[i]
打印[y]
打印(“”)
CompareTimeStampArray[i]=“X”
打破
打印(“完成”)
熊猫库在这里可能会有所帮助。它允许您将csv文件加载到类似电子表格的格式中,您可以使用该格式执行列操作。它还可以很好地处理时间格式。试试这个:
编辑:现在考虑新的输入格式
import pandas as pd
import numpy as np
# make up some timestamps in ascending order
stamps = ['08-01-01 10:00', '08-01-01 10:01', '08-01-01 10:02',
'08-01-02 03:10', '08-01-02 03:11', '08-02-15 13:34',
'08-03-06 09:06', '08-03-06 09:07', '08-03-06 09:08', ]
# get original timestamps into a pandas dataframe
ts = pd.DataFrame(stamps, columns=['orig_timestamp'])
# assuming that the timestamps are in year-month-day hour:minute format
ts['Timestamp'] = pd.to_datetime(ts['orig_timestamp'], format='%y-%m-%d %H:%M')
# get the timedelta between consecutive rows, set to 0 for first row
ts['Timedelta'] = ts['Timestamp'].diff().fillna(value=0)
# get the timedelta in minutes
ts['minute_delta'] = ts['Timedelta'].astype('timedelta64[m]')
# set to True whenever a new Session begins, i.e. timedelta is not one minute
ts['newSession'] = np.where(ts['minute_delta'] == 1, False, True)
# cumulative sum of session starts
ts['SessionID'] = ts['newSession'].cumsum()
# group timestamps by SessionID and count their duration
grouped_timestamps = ts[['orig_timestamp', 'SessionID']].\
groupby(['SessionID']).agg(['first', 'last', 'count'])
print(ts[['orig_timestamp', 'minute_delta', 'newSession', 'SessionID']])
print(grouped_timestamps)
最后的数据帧如下所示:
orig_timestamp minute_delta newSession SessionID
0 08-01-01 10:00 0.0 True 1
1 08-01-01 10:01 1.0 False 1
2 08-01-01 10:02 1.0 False 1
3 08-01-02 03:10 1028.0 True 2
4 08-01-02 03:11 1.0 False 2
5 08-02-15 13:34 63983.0 True 3
6 08-03-06 09:06 28532.0 True 4
7 08-03-06 09:07 1.0 False 4
8 08-03-06 09:08 1.0 False 4
分组生成一个数据帧,列“count”为每个会话运行的分钟数:
orig_timestamp
first last count
SessionID
1 08-01-01 10:00 08-01-01 10:02 3
2 08-01-02 03:10 08-01-02 03:11 2
3 08-02-15 13:34 08-02-15 13:34 1
4 08-03-06 09:06 08-03-06 09:08 3
我似乎误解了这个问题。我以为您只需要计算每个会话运行的分钟数。您还需要做什么?感谢您的快速响应,它主要是获取原始时间戳数据,即YY-MM-DD HH-MM,然后计算出物联网在关闭之前运行了多长时间。该设备可能会运行几天,然后关闭几个小时,然后重新启动,因此我正在尝试查找设备打开时的开始和结束时间。在日志中,您可以看到一分钟间隔的时间戳,然后是开始时间戳,一分钟间隔,然后是关闭前的结束时间戳。当再次打开时,它将重新启动日志记录。我稍后会看一看你的想法,看我能否把它付诸实施work@BrendonShaw解决方案是否无效,或者结果是否与您预期的不符?编辑之前没有“Timesession x”列会使事情变得更困难。此外,日志中是否确实存在重复条目?