python lambda,numpy帮助查找时间总和

python lambda,numpy帮助查找时间总和,python,numpy,lambda,Python,Numpy,Lambda,我有以下清单 Events = [0, 0, 0, 1, 1, 0] Details = ['Start', 'End', 'Start', 'Start', 'End', 'End] Time = [0, 1, 4, 5, 10, 16] 我需要按照以下方式对单个事件进行分组: Event 0: Sum of Start Times = 0+4 = 4 Sum of End Times = 1+16 = 17 Total time spend by event 0 = 17-4 = 13

我有以下清单

Events = [0, 0, 0, 1, 1, 0]
Details = ['Start', 'End', 'Start', 'Start', 'End', 'End]
Time = [0, 1, 4, 5, 10, 16]
我需要按照以下方式对单个事件进行分组:

Event 0:
Sum of Start Times = 0+4 = 4
Sum of End Times = 1+16 = 17
Total time spend by event 0 = 17-4 = 13

Event 1:
Sum of start times = 5
Sum of end times = 10
Total time spend by event 1 = 10-5=5
我想要一些做这个的速记版本。当有大量的事件和大量的计时时,就需要像在Java中那样为if循环定义一种语法


有没有一种有效的方法可以做到这一点?

像一个选项一样,您可以执行以下操作:

result = {}
for e, d, t in zip(Events, Details, Time):
    result.setdefault(e, {})
    result[e].setdefault(d, 0)
    result[e][d] += t

print result
>>> {0: {'Start': 4, 'End': 17}, 1: {'Start': 5, 'End': 10}}
之后,很容易产生您期望的输出

更新:

感谢@abarnert: 从收款进口柜台

result = {}
for e, d, t in zip(Events, Details, Time):
    result.setdefault(e, collections.Counter())[d] += t
print result
>>> {0: Counter({'End': 17, 'Start': 4}), 1: Counter({'End': 10, 'Start': 5})}
感谢@AMacK:

result = {}
for e, d, t in zip(Events, Details, Time):
    result.setdefault(e, {}).setdefault(d, []).append(t)

print result
>>> {0: {'Start': [0, 4], 'End': [1, 16]}, 1: {'Start': [5], 'End': [10]}}
致以最良好的祝愿,
Artem

使用Numpy,您可以这样做:

>>> import numpy as np
>>> Events = np.array([0, 0, 0, 1, 1, 0])
>>> Details = np.array(['Start', 'End', 'Start', 'Start', 'End', 'End'])
>>> Time = np.array([0, 1, 4, 5, 10, 16])
>>> is_start = (Details == 'Start')
>>> sum_start = np.bincount(Events[is_start], Time[is_start])
>>> sum_end = np.bincount(Events[~is_start], Time[~is_start])
>>> durations = sum_end - sum_start
>>> durations
array([ 13.,   5.])
如果您的数据已经在Numpy数组中,那么这将比基于Python循环的方法更快(~10倍)。如果您的数据尚未在Numpy数组中,那么它只会比循环快一点点(<2x),因为遍历大型Python列表比实际计数要慢

import numpy as np

def evcount(events, details, time):
    events = np.asarray(events)
    details = np.asarray(details)
    time = np.asarray(time)

    is_start = (details == 'Start')
    sum_start = np.bincount(events[is_start], time[is_start], minlength=nbins)
    sum_end = np.bincount(events[~is_start], time[~is_start], minlength=nbins)
    return sum_end - sum_start

def evcount2(events, details, time):
    result = {}
    for e, d, t in zip(events, details, time):
        result.setdefault(e, {}).setdefault(d, []).append(t)
    return result

n = 20000
nbins = 200

events_arr = np.random.randint(0, nbins, n)
events = events_arr.tolist()
times_arr = np.random.rand(n)
times = times_arr.tolist()
details_arr = np.array(['Start', 'End'])[np.random.randint(0, 2, n)]
details = details_arr.tolist()

def doit_numpy_list():
    evcount(events, details, times)

def doit_numpy_arrays():
    evcount(events_arr, details_arr, times_arr)

def doit_loop():
    evcount2(events, details, times)


阿克,你赢了我!您还可以将结果[e].setdefault(d,0)更改为结果[e].setdefault(d,[])。追加(t)并删除以下行以保留组件值。@我尝试使其更易于理解,但谢谢,您是对的。如果要使用
setdefault
,您可以直接使用返回的值。例如,将所有三行替换为一行,
result.setdefault(e,collections.Counter())[d]+=t
或(对于@AMacK的版本,
result.setdefault(e,{}).setdefault(d,[]).append(t)
)。哇,太酷了。有人能告诉我setdefault()在这里做什么吗?还有如何求“1”和“0”之和,然后减去它们。在某些情况下,“0”可能为空或值少于“1”。这怎么办?
In [34]: %timeit doit_numpy_list()
100 loops, best of 3: 4.03 ms per loop

In [35]: %timeit doit_numpy_arrays()
1000 loops, best of 3: 781 µs per loop

In [36]: %timeit doit_loop()
100 loops, best of 3: 6.18 ms per loop