Python 如何找到时间戳范围和间隔-想法?

Python 如何找到时间戳范围和间隔-想法?,python,time,timestamp,Python,Time,Timestamp,我有一个物联网设备运行了18个月,有很多数据需要分析。设备已在不同时间打开和关闭,我想使用具有以下格式的时间戳计算设备何时打开,并且每个样本以一分钟的间隔采集: 08-01-01 10:00 08-01-01 10:01 08-01-01 10:00 08-01-02 03:10 08-01-02 03:11 理想情况下,我希望生成以下格式的报告: Time session 1 - 08-01-01 10:00 08-01-01 10:02 Session 1 ran for thre

我有一个物联网设备运行了18个月,有很多数据需要分析。设备已在不同时间打开和关闭,我想使用具有以下格式的时间戳计算设备何时打开,并且每个样本以一分钟的间隔采集:

08-01-01 10:00
08-01-01 10:01
08-01-01 10:00
08-01-02 03:10 
08-01-02 03:11
理想情况下,我希望生成以下格式的报告:

Time session 1 - 08-01-01 10:00  08-01-01 10:02   Session 1 ran for three minutes
Time session 2 - 08-01-02 02:10  08-01-02 03:11   Session 2 ran for 2 minutes
问题是我有超过150k的时间戳,无法想出一种快速排序数据的方法,目前我正在使用另一个数组,它是从第一个时间戳到最后一个时间戳的完整时间戳。然后将原始时间戳数组与主时间戳进行比较,然后设置一个标记。It工人,但不是很有效率,并试图想出一个更好的方法来分析这些数据

import csv
from datetime import date, datetime, timedelta

with open('HomeOfficeApr.csv', 'rU') as csvfile:
    readCSV = csv.reader(csvfile, delimiter=',')
    orgtimestamp = []
    for row in readCSV:
        ts = row[0]
        orgtimestamp.append(ts)


for elements in range(len(orgtimestamp)):
    orgtimestamp[elements]=orgtimestamp[elements][:-9]
    #   print(timestamp[elements])


print("First time stamp")
print(orgtimestamp[0])
print("Create time stamp range")


def datetime_range(start, end, delta):
   current = start
   if not isinstance(delta, timedelta):
      delta = timedelta(**delta)
   while current < end:
      yield current
      current += delta

   #Timestamps hard coded - need to change to first and last timestamp
   start = datetime(2017,04,13, 8, 30)
   end = datetime(2018,12,31, 12, 0)
   gentimestamp = []

#this unlocks the following interface:
for dt in datetime_range(start, end, {'days':0, 'minutes':1}):
    gentimestamp.append(str(dt))


for i in range(len(gentimestamp)):
    gentimestamp[i]=gentimestamp[i][:-3]

print("Compare time stamp")
print(len(gentimestamp))
CompareTimeStampArray = [None] *  len(gentimestamp)
for i in range(len(CompareTimeStampArray)):
    CompareTimeStampArray[i] = "Y"

for i in range(len(orgtimestamp)):
    for y in range(len(gentimestamp)):
    if (orgtimestamp[i][0:4]) == (gentimestamp[y][0:4]):
        #print("Match year")
        #print(orgtimestamp[i][0:4])
        #print(gentimestamp[y][0:4])
        if (orgtimestamp[i][5:7]) == (gentimestamp[y][5:7]):
            #print("Match month")
            #print(orgtimestamp[i][5:7])
            #print(gentimestamp[y][5:7])
            if (orgtimestamp[i][8:10]) == (gentimestamp[y][8:10]):
                #print("Match day")
                #print(orgtimestamp[i][8:10])
                #print(gentimestamp[y][8:10])
                if (orgtimestamp[i][11:13]) == (gentimestamp[y][11:13]):
                    #print("Match hour")
                    #print(orgtimestamp[i][11:13])
                    #print(gentimestamp[y][11:13])
                    if (orgtimestamp[i][14:16]) == (gentimestamp[y][14:16]):
                        print("Match second")
                        print("Date & time match")
                        print(orgtimestamp[i])
                        print(gentimestamp[y])
                        print[i]
                        print[y]
                        print("")
                        CompareTimeStampArray[i] = "X"
                        break

print("Finished")
导入csv
从日期时间导入日期、日期时间、时间增量
将open('HomeOfficeApr.csv','rU')作为csvfile:
readCSV=csv.reader(csvfile,分隔符=',')
orgtimestamp=[]
对于readCSV中的行:
ts=行[0]
orgtimestamp.append(ts)
对于范围(len(orgtimestamp))中的元素:
orgtimestamp[elements]=orgtimestamp[elements][:-9]
#打印(时间戳[元素])
打印(“首次盖章”)
打印(时间戳[0])
打印(“创建时间戳范围”)
def datetime_范围(开始、结束、增量):
电流=启动
如果不存在(增量、时间增量):
增量=时间增量(**增量)
当前<结束时:
屈服电流
电流+=增量
#硬编码时间戳-需要更改为第一个和最后一个时间戳
开始=日期时间(2017,04,13,8,30)
结束=日期时间(2018,12,31,12,0)
gentimestamp=[]
#这将解锁以下接口:
对于datetime_范围内的dt(开始、结束,{'days':0,'minutes':1}):
gentimestamp.append(str(dt))
对于范围内的i(len(gentimestamp)):
gentimestamp[i]=gentimestamp[i][:-3]
打印(“比较时间戳”)
印刷品(len(gentimestamp))
CompareTimeStampArray=[None]*len(gentimestamp)
对于范围内的i(len(CompareTimeStampArray)):
CompareTimeStampArray[i]=“Y”
对于范围内的i(len(orgtimestamp)):
对于范围内的y(len(gentimestamp)):
如果(orgtimestamp[i][0:4])==(gentimestamp[y][0:4]):
#打印(“比赛年”)
#打印(时间戳[i][0:4])
#打印(gentimestamp[y][0:4])
如果(orgtimestamp[i][5:7])==(gentimestamp[y][5:7]):
#打印(“比赛月”)
#打印(时间戳[i][5:7])
#打印(gentimestamp[y][5:7])
如果(orgtimestamp[i][8:10])==(gentimestamp[y][8:10]):
#打印(“比赛日”)
#打印(时间戳[i][8:10])
#打印(gentimestamp[y][8:10])
如果(orgtimestamp[i][11:13])==(gentimestamp[y][11:13]):
#打印(“比赛时间”)
#打印(时间戳[i][11:13])
#打印(gentimestamp[y][11:13])
如果(orgtimestamp[i][14:16])==(gentimestamp[y][14:16]):
打印(“匹配秒”)
打印(“日期和时间匹配”)
打印(时间戳[i])
打印(gentimestamp[y])
印刷品[i]
打印[y]
打印(“”)
CompareTimeStampArray[i]=“X”
打破
打印(“完成”)

熊猫库在这里可能会有所帮助。它允许您将csv文件加载到类似电子表格的格式中,您可以使用该格式执行列操作。它还可以很好地处理时间格式。试试这个:

编辑:现在考虑新的输入格式

import pandas as pd
import numpy as np

# make up some timestamps in ascending order
stamps = ['08-01-01 10:00', '08-01-01 10:01', '08-01-01 10:02', 
          '08-01-02 03:10', '08-01-02 03:11', '08-02-15 13:34', 
          '08-03-06 09:06', '08-03-06 09:07', '08-03-06 09:08', ]

# get original timestamps into a pandas dataframe
ts = pd.DataFrame(stamps, columns=['orig_timestamp'])
# assuming that the timestamps are in year-month-day hour:minute format
ts['Timestamp'] = pd.to_datetime(ts['orig_timestamp'], format='%y-%m-%d %H:%M')
# get the timedelta between consecutive rows, set to 0 for first row
ts['Timedelta'] = ts['Timestamp'].diff().fillna(value=0)
# get the timedelta in minutes
ts['minute_delta'] = ts['Timedelta'].astype('timedelta64[m]')
# set to True whenever a new Session begins, i.e. timedelta is not one minute
ts['newSession'] = np.where(ts['minute_delta'] == 1, False, True)
# cumulative sum of session starts
ts['SessionID'] = ts['newSession'].cumsum()
# group timestamps by SessionID and count their duration
grouped_timestamps = ts[['orig_timestamp', 'SessionID']].\
                         groupby(['SessionID']).agg(['first', 'last', 'count'])
print(ts[['orig_timestamp', 'minute_delta', 'newSession', 'SessionID']])
print(grouped_timestamps)
最后的数据帧如下所示:

   orig_timestamp  minute_delta  newSession  SessionID
0  08-01-01 10:00           0.0        True          1
1  08-01-01 10:01           1.0       False          1
2  08-01-01 10:02           1.0       False          1
3  08-01-02 03:10        1028.0        True          2
4  08-01-02 03:11           1.0       False          2
5  08-02-15 13:34       63983.0        True          3
6  08-03-06 09:06       28532.0        True          4
7  08-03-06 09:07           1.0       False          4
8  08-03-06 09:08           1.0       False          4
分组生成一个数据帧,列“count”为每个会话运行的分钟数:

           orig_timestamp                      
                    first            last count
SessionID                                      
1          08-01-01 10:00  08-01-01 10:02     3
2          08-01-02 03:10  08-01-02 03:11     2
3          08-02-15 13:34  08-02-15 13:34     1
4          08-03-06 09:06  08-03-06 09:08     3

我似乎误解了这个问题。我以为您只需要计算每个会话运行的分钟数。您还需要做什么?感谢您的快速响应,它主要是获取原始时间戳数据,即YY-MM-DD HH-MM,然后计算出物联网在关闭之前运行了多长时间。该设备可能会运行几天,然后关闭几个小时,然后重新启动,因此我正在尝试查找设备打开时的开始和结束时间。在日志中,您可以看到一分钟间隔的时间戳,然后是开始时间戳,一分钟间隔,然后是关闭前的结束时间戳。当再次打开时,它将重新启动日志记录。我稍后会看一看你的想法,看我能否把它付诸实施work@BrendonShaw解决方案是否无效,或者结果是否与您预期的不符?编辑之前没有“Timesession x”列会使事情变得更困难。此外,日志中是否确实存在重复条目?