Python 基于列值的日期时间差总和

Python 基于列值的日期时间差总和,python,pandas,Python,Pandas,我有一个数据框,看起来像: field1 field2 field3 time t1 1 1 1 t2 1 1 0 t3 2 3 1 t4 3 3 0 t5 1 2 0 fie

我有一个数据框,看起来像:

        field1    field2    field3
time
  t1         1         1         1
  t2         1         1         0
  t3         2         3         1
  t4         3         3         0
  t5         1         2         0     
            field3=0    field3=1
(1,1)          2 min       1 min
(2,3)            ...         ...
(3,3)            ...         ...  
(1,2)            ...         ...
时间的格式为
yyyy-mm-dd hh:mm:ss
,当前正在为数据帧编制索引

字段1
字段2
用于识别项目,因此元组
(字段1,字段2)
对应于世界某处的特定传感器<代码>字段3是该传感器在给定时间的值,取值为0或1

我希望将数据帧分组(field1,field2),并将每个传感器从字段3获取每个值的总时间相加。因此,如果
t1='2016-07-20 00:00:00'
t2='2016-07-20 00:01:00'
,并且当前时间是
'2016-07-20 00:03:00'
,我将有一个新的数据帧,看起来像:

        field1    field2    field3
time
  t1         1         1         1
  t2         1         1         0
  t3         2         3         1
  t4         3         3         0
  t5         1         2         0     
            field3=0    field3=1
(1,1)          2 min       1 min
(2,3)            ...         ...
(3,3)            ...         ...  
(1,2)            ...         ...
我假设从
t1
t2
field3
的值为1,从
t2
开始为0,因为(1,1)不会再次出现在数据帧中。
1min
来自
t2-t1
2min
来自
current\u time-t2

2分钟
1分钟
可以是任何格式(可以是总分/秒、时间增量或其他格式)

我尝试了以下方法:

import pandas as pd
from collections import defaultdict, namedtuple

# so i can create a defaultdict(Field3) and save some logic
class Field3(object):
    def __init__(self):
            self.zero= pd.Timedelta('0 days')
            self.one = pd.Timedelta('0 days')

# used to map to field3 in a dictionary
Sensor = namedtuple('Sensor','field1 field2')

# the dataframe mentioned above
df = pd.DataFrame(...)

# iterate through each  row of the dataframe and map from (field1,field2) to
# field3, adding time based on the value of field3 in the frame and the 
# time difference between this row and the next
rows = list(df.iterrows())
sensor_to_field3 = defaultdict(Field3)
for i in xrange(len(rows)-1):
        sensor = Sensor(field1=rows[i][1][0],field2=rows[i][1][1])
        if rows[i][1][2]: sensor_to_field3[spot].one += rows[i+1][0]-rows[i][0]
        else: spot_to_status[spot].zero += rows[i+1][0]-rows[i][0]
spot_to_status = {k:[v] for k,v in sensor_to_field3.iteritems()}
result = pd.DataFrame(sensor_to_field3,index=[0])
这基本上让我明白了,但我想(尽管目前它只在整个表中有一个传感器时起作用,如果有更好的解决方法,我真的不想处理这个问题)


我觉得应该有更好的方法来解决这个问题。类似于在
field1、field2
上进行分组,然后根据
field3
time
索引聚合timedelta,但我不知道该怎么做。

成功地获得了它,以防其他人遇到类似的情况。仍然不确定它是否是最佳的,但感觉比我做的更好

我更改了原始数据帧,将时间作为一列包含,并且只使用整数索引

def create_time_deltas(dataframe):
    # add a timedelta column
    dataframe['timedelta'] = pd.Timedelta(minutes=0)
    # iterate over each row and set the timedelta to the difference of the next one and this one
    for i in dataframe.index[:-1]:
            dataframe.set_value(i,'timedelta',dataframe.loc[i+1,'time']dataframe.loc[i,'time'])
    # set the last time value, which couldn't be set earlier because index out of bounds
    dataframe.set_value(i+1,'timedelta',pd.to_datetime(datetime.now())-dataframe.loc[i,'time'])
    return dataframe

def group_by_field3_time(dataframe, start=None, stop=None):
    # optionally set time bounds on what to care about
    stop = stop or pd.to_datetime(datetime.now())
    recent = dataframe.loc[logical_and(start < df['time'] , df['time'] < stop)]
    # groupby and apply to create a new dataframe with the time_deltas column 
    by_td = df.groupby(['field1','field2']).apply(create_time_deltas)
    # sum the timedeltas for each triple, which can be used later
    by_oc = by_td.groupby(['field1','field2','field3']).sum()
    return by_oc
def创建时间增量(数据帧):
#添加一个timedelta列
数据帧['timedelta']=pd.timedelta(分钟=0)
#迭代每一行,并将timedelta设置为下一行和这一行的差值
对于dataframe.index[:-1]中的i:
dataframe.set_值(i,'timedelta',dataframe.loc[i+1,'time']dataframe.loc[i,'time']
#设置上次时间值,由于索引超出范围,无法更早设置该值
dataframe.set_值(i+1,'timedelta',pd.to_datetime(datetime.now())-dataframe.loc[i,'time']))
返回数据帧
def分组按字段3时间(数据帧,开始=无,停止=无):
#可以选择设置要关注的内容的时间界限
stop=stop或pd.to_datetime(datetime.now())
最近=dataframe.loc[逻辑_和(开始

如果有人能想出更好的方法来做这件事,我洗耳恭听,但这确实比在各地创建字典感觉好多了。

设法做到了,以防其他人遇到类似的事情。仍然不确定它是否是最佳的,但感觉比我做的更好

我更改了原始数据帧,将时间作为一列包含,并且只使用整数索引

def create_time_deltas(dataframe):
    # add a timedelta column
    dataframe['timedelta'] = pd.Timedelta(minutes=0)
    # iterate over each row and set the timedelta to the difference of the next one and this one
    for i in dataframe.index[:-1]:
            dataframe.set_value(i,'timedelta',dataframe.loc[i+1,'time']dataframe.loc[i,'time'])
    # set the last time value, which couldn't be set earlier because index out of bounds
    dataframe.set_value(i+1,'timedelta',pd.to_datetime(datetime.now())-dataframe.loc[i,'time'])
    return dataframe

def group_by_field3_time(dataframe, start=None, stop=None):
    # optionally set time bounds on what to care about
    stop = stop or pd.to_datetime(datetime.now())
    recent = dataframe.loc[logical_and(start < df['time'] , df['time'] < stop)]
    # groupby and apply to create a new dataframe with the time_deltas column 
    by_td = df.groupby(['field1','field2']).apply(create_time_deltas)
    # sum the timedeltas for each triple, which can be used later
    by_oc = by_td.groupby(['field1','field2','field3']).sum()
    return by_oc
def创建时间增量(数据帧):
#添加一个timedelta列
数据帧['timedelta']=pd.timedelta(分钟=0)
#迭代每一行,并将timedelta设置为下一行和这一行的差值
对于dataframe.index[:-1]中的i:
dataframe.set_值(i,'timedelta',dataframe.loc[i+1,'time']dataframe.loc[i,'time']
#设置上次时间值,由于索引超出范围,无法更早设置该值
dataframe.set_值(i+1,'timedelta',pd.to_datetime(datetime.now())-dataframe.loc[i,'time']))
返回数据帧
def分组按字段3时间(数据帧,开始=无,停止=无):
#可以选择设置要关注的内容的时间界限
stop=stop或pd.to_datetime(datetime.now())
最近=dataframe.loc[逻辑_和(开始
如果有人能想出更好的方法来做这件事,我洗耳恭听,但这确实比到处编字典感觉好多了