Python3,嵌套dict比较(递归?)

Python3,嵌套dict比较(递归?),python,list,csv,dictionary,recursion,Python,List,Csv,Dictionary,Recursion,我正在编写一个程序,获取一个.csv文件,并为票据关闭数据创建“度量”。每张票有一个或多个时间条目;目标是在每张票的基础上获取打开->关闭和时间开始->时间结束的“增量”(即时差);这些不是真正的变量,它们只是为了这个问题 假设我们有12345号票证,它有3个时间条目,如下所示: 车票:12345 开放时间:2016-09-26 00:00:00.000关闭时间:2016-09-27 00:01:00.000 开始时间:2016-09-26 00:01:00.000结束时间:2016-09-26

我正在编写一个程序,获取一个.csv文件,并为票据关闭数据创建“度量”。每张票有一个或多个时间条目;目标是在每张票的基础上获取
打开
->
关闭
时间开始
->
时间结束
的“增量”(即时差);这些不是真正的变量,它们只是为了这个问题

假设我们有12345号票证,它有3个时间条目,如下所示:

车票:12345
开放时间:2016-09-26 00:00:00.000关闭时间:2016-09-27 00:01:00.000
开始时间:2016-09-26 00:01:00.000结束时间:2016-09-26 00:02:00.000
门票:12345
开放时间:2016-09-26 00:00:00.000关闭时间:2016-09-27 00:01:00.000
开始时间:2016-09-26 00:01:00.000结束时间:2016-09-26 00:02:00.000
门票:12345
开放时间:2016-09-26 00:00:00.000关闭时间:2016-09-27 00:01:00.000
开始时间:2016-09-26 00:01:00.000结束时间:2016-09-27 00:02:00.000

我想让程序为此显示一个条目,将“增量”相加,如下所示:

车票:12345
增量打开/关闭($从打开到关闭的总时间):
增量开始/结束:($total time of ALL ticket time entries合计的$total time)

这是我到目前为止所拥有的

.csv示例:

Ticket #,Ticket Type,Opened,Closed,Time Entry Day,Start,End
737385,Software,2016-09-06 12:48:31.680,2016-09-06 15:41:52.933,2016-09-06 00:00:00.000,1900-01-01 15:02:00.417,1900-01-01 15:41:00.417
737318,Hardware,2016-09-06 12:20:28.403,2016-09-06 14:35:58.223,2016-09-06 00:00:00.000,1900-01-01 14:04:00.883,1900-01-01 14:35:00.883
737296,Printing/Scan/Fax,2016-09-06 11:37:10.387,2016-09-06 13:33:07.577,2016-09-06 00:00:00.000,1900-01-01 13:29:00.240,1900-01-01 13:33:00.240
737273,Software,2016-09-06 10:54:40.177,2016-09-06 13:28:24.140,2016-09-06 00:00:00.000,1900-01-01 13:17:00.860,1900-01-01 13:28:00.860
737261,Software,2016-09-06 10:33:09.070,2016-09-06 13:19:41.573,2016-09-06 00:00:00.000,1900-01-01 13:05:00.113,1900-01-01 13:15:00.113
737238,Software,2016-09-06 09:52:57.090,2016-09-06 14:42:16.287,2016-09-06 00:00:00.000,1900-01-01 12:01:00.350,1900-01-01 12:04:00.350
737238,Software,2016-09-06 09:52:57.090,2016-09-06 14:42:16.287,2016-09-06 00:00:00.000,1900-01-01 14:36:00.913,1900-01-01 14:42:00.913
737220,Password,2016-09-06 09:28:16.060,2016-09-06 11:41:16.750,2016-09-06 00:00:00.000,1900-01-01 11:30:00.303,1900-01-01 11:36:00.303
737197,Hardware,2016-09-06 08:50:23.197,2016-09-06 14:02:18.817,2016-09-06 00:00:00.000,1900-01-01 13:48:00.530,1900-01-01 14:02:00.530
736964,Internal,2016-09-06 01:02:27.453,2016-09-06 05:46:00.160,2016-09-06 00:00:00.000,1900-01-01 06:38:00.917,1900-01-01 06:45:00.917
上课时间\u Entry.py:

#! /usr/bin/python
from datetime import *

class Time_Entry:

def __init__(self, ticket_no, time_entry_day, opened, closed, start, end):
    self.ticket_no = ticket_no
    self.time_entry_day = time_entry_day
    self.opened = opened
    self.closed = closed
    self.start = datetime.strptime(start, '%Y-%m-%d %H:%M:%S.%f')
    self.end = datetime.strptime(end, '%Y-%m-%d %H:%M:%S.%f')
    self.total_open_close_delta = 0
    self.total_start_end_delta = 0

def open_close_delta(self, topen, tclose):
    open_time = datetime.strptime(topen, '%Y-%m-%d %H:%M:%S.%f')
    if tclose != '\\N':
        close_time = datetime.strptime(tclose, '%Y-%m-%d %H:%M:%S.%f')
        self.total_open_close_delta = close_time - open_time

def start_end_delta(self, tstart, tend):
    start_time = datetime.strptime(tstart, '%Y-%m-%d %H:%M:%S.%f')
    end_time = datetime.strptime(tend, '%Y-%m-%d %H:%M:%S.%f')
    start_end_delta = (end_time - start_time).seconds
    self.total_start_end_delta += start_end_delta
    return (self.total_start_end_delta)

def add_start_end_delta(self, delta):
    self.total_start_end_delta += delta

def display(self):
    print('Ticket #: %7.7s Start: %-15s End: %-15s Delta: %-10s' % (self.ticket_no, self.start.time(), self.end.time(), self.total_start_end_delta))
它由metrics.py调用:

#! /usr/bin/python

import csv
import pprint
from Time_Entry import *

file = '/home/jmd9qs/userdrive/metrics.csv'

# setup CSV, load up a list of dicts
reader = csv.DictReader(open(file))
dict_list = []

for line in reader:
    dict_list.append(line)

def load_tickets(ticket_list):
    for i, key in enumerate(ticket_list):
        ticket_no = key['Ticket #']
        time_entry_day = key['Time Entry Day']
        opened = key['Opened']
        closed = key['Closed']
        start = key['Start']
        end = key['End']

        time_entry = Time_Entry(ticket_no, time_entry_day, opened, closed, start, end)
        time_entry.open_close_delta(opened, closed)
        time_entry.start_end_delta(start, end)

        for h, key2 in enumerate(ticket_list):
            ticket_no2 = key2['Ticket #']
            time_entry_day2 = key2['Time Entry Day']
            opened2 = key2['Opened']
            closed2 = key2['Closed']
            start2 = key2['Start']
            end2 = key2['End']
            time_entry2 = Time_Entry(ticket_no2, time_entry_day2, opened2, closed2, start2, end2)

            if time_entry.ticket_no == time_entry2.ticket_no and i != h:
                # add delta and remove second time_entry from dict (no counting twice)
                time_entry2_delta = time_entry2.start_end_delta(start2, end2)
                time_entry.add_start_end_delta(time_entry2_delta)
                del dict_list[h]
    time_entry.display()

load_tickets(dict_list)
到目前为止,这似乎还行;然而,我得到了每个票据的多行输出,而不是添加了“delta”的一行。仅供参考,程序显示输出的方式与我的示例不同,这是有意的。见下例:

Ticket #:  738388 Start: 15:24:00.313000 End: 15:35:00.313000 Delta: 2400      
Ticket #:  738388 Start: 16:30:00.593000 End: 16:40:00.593000 Delta: 1260      
Ticket #:  738381 Start: 15:40:00.763000 End: 16:04:00.767000 Delta: 1440      
Ticket #:  738357 Start: 13:50:00.717000 End: 14:10:00.717000 Delta: 1200      
Ticket #:  738231 Start: 11:16:00.677000 End: 11:21:00.677000 Delta: 720       
Ticket #:  738203 Start: 16:15:00.710000 End: 16:31:00.710000 Delta: 2160      
Ticket #:  738203 Start: 09:57:00.060000 End: 10:02:00.060000 Delta: 1560      
Ticket #:  738203 Start: 12:26:00.597000 End: 12:31:00.597000 Delta: 900       
Ticket #:  738135 Start: 13:25:00.880000 End: 13:50:00.880000 Delta: 2040      
Ticket #:  738124 Start: 07:56:00.117000 End: 08:31:00.117000 Delta: 2100      
Ticket #:  738121 Start: 07:47:00.903000 End: 07:52:00.903000 Delta: 300       
Ticket #:  738115 Start: 07:15:00.443000 End: 07:20:00.443000 Delta: 300       
Ticket #:  737926 Start: 06:40:00.813000 End: 06:47:00.813000 Delta: 420       
Ticket #:  737684 Start: 18:50:00.060000 End: 20:10:00.060000 Delta: 13380     
Ticket #:  737684 Start: 13:00:00.560000 End: 13:08:00.560000 Delta: 8880      
Ticket #:  737684 Start: 08:45:00        End: 10:00:00        Delta: 9480      
请注意,有几张票有多个条目,这是我不想要的


关于样式、惯例等的任何注释都是受欢迎的,因为我正试图变得更“Pythonic”

这里的问题是,对于像您实现的那样的嵌套循环,您会重复检查同一个票据。让我更好地解释一下:

ticket_list = [111111, 111111, 666666, 777777] # lets simplify considering the ids only

# I'm trying to keep the same variable names
for i, key1 in enumerate(ticket_list): # outer loop

    cnt = 1

    for h, key2 in enumerate(ticket_list): # inner loop
        if key1 == key2 and i != h:
            print('>> match on i:', i, '- h:', h)
            cnt += 1

    print('Found', key1, cnt, 'times')
查看它如何对
111111

>> match on i: 0 - h: 1
Found 111111 2 times
>> match on i: 1 - h: 0
Found 111111 2 times
Found 666666 1 times
Found 777777 1 times
这是因为当内部循环检查第一个位置和外部循环检查第二个位置(
i:0,h:1
)时,以及当外部循环检查第二个位置和内部循环检查第一个位置(
i:1,h:0
)时,都将匹配
111111


提议的解决办法 对于您的问题,更好的解决方案是将同一票证的条目分组在一起,然后对增量求和。非常适合您的任务。在这里,我冒昧地重写了一些代码:

在这里,我修改了构造函数以接受字典本身。这样以后传递参数就不那么麻烦了。我还删除了添加增量的方法,稍后我们将了解原因

import csv
import itertools
from datetime import *

class Time_Entry(object):

    def __init__(self, entry):
        self.ticket_no = entry['Ticket #']
        self.time_entry_day = entry['Time Entry Day']
        self.opened = datetime.strptime(entry['Opened'], '%Y-%m-%d %H:%M:%S.%f')
        self.closed = datetime.strptime(entry['Closed'], '%Y-%m-%d %H:%M:%S.%f')
        self.start = datetime.strptime(entry['Start'], '%Y-%m-%d %H:%M:%S.%f')
        self.end = datetime.strptime(entry['End'], '%Y-%m-%d %H:%M:%S.%f')
        self.total_open_close_delta = (self.closed - self.opened).seconds
        self.total_start_end_delta = (self.end - self.start).seconds


    def display(self):
        print('Ticket #: %7.7s Start: %-15s End: %-15s Delta: %-10s' % (self.ticket_no, self.start.time(), self.end.time(), self.total_start_end_delta))
在这里,我们使用加载数据,最终输出将是
Time\u条目列表
对象:

with open('metrics.csv') as ticket_list:
    time_entry_list = [Time_Entry(line) for line in csv.DictReader(ticket_list)]

print(time_entry_list)
# [<Time_Entry object at 0x101142f60>, <Time_Entry object at 0x10114d048>, <Time_Entry object at 0x1011fddd8>, ... ]
ticket
中的最终结果是一个列表元组,其中ticket id位于第一个位置,关联的
Time\u条目列表位于最后一个位置:

print(tickets)
# [('737385', [<Time_Entry object at 0x101142f60>]),
#  ('737318', [<Time_Entry object at 0x10114d048>]),
#  ('737238', [<Time_Entry object at 0x1011fdd68>, <Time_Entry object at 0x1011fde80>]),
#  ...]
输出:

ticket: 736964
Delta open / close: 17012
Delta start / end: 420
(found 1 occurrences)

ticket: 737197
Delta open / close: 18715
Delta start / end: 840
(found 1 occurrences)

ticket: 737220
Delta open / close: 7980
Delta start / end: 360
(found 1 occurrences)

ticket: 737238
Delta open / close: 34718
Delta start / end: 540
(found 2 occurrences)

ticket: 737261
Delta open / close: 9992
Delta start / end: 600
(found 1 occurrences)

ticket: 737273
Delta open / close: 9223
Delta start / end: 660
(found 1 occurrences)

ticket: 737296
Delta open / close: 6957
Delta start / end: 240
(found 1 occurrences)

ticket: 737318
Delta open / close: 8129
Delta start / end: 1860
(found 1 occurrences)

ticket: 737385
Delta open / close: 10401
Delta start / end: 2340
(found 1 occurrences)

在故事的结尾:列表理解可能非常有用,它们允许您使用超级紧凑的语法完成许多工作。此外,python标准库还包含许多随时可用的工具,这些工具可以真正帮助您,所以请熟悉

这里的问题是,对于像您实现的那样的嵌套循环,您会重复检查同一个票据。让我更好地解释一下:

ticket_list = [111111, 111111, 666666, 777777] # lets simplify considering the ids only

# I'm trying to keep the same variable names
for i, key1 in enumerate(ticket_list): # outer loop

    cnt = 1

    for h, key2 in enumerate(ticket_list): # inner loop
        if key1 == key2 and i != h:
            print('>> match on i:', i, '- h:', h)
            cnt += 1

    print('Found', key1, cnt, 'times')
查看它如何对
111111

>> match on i: 0 - h: 1
Found 111111 2 times
>> match on i: 1 - h: 0
Found 111111 2 times
Found 666666 1 times
Found 777777 1 times
这是因为当内部循环检查第一个位置和外部循环检查第二个位置(
i:0,h:1
)时,以及当外部循环检查第二个位置和内部循环检查第一个位置(
i:1,h:0
)时,都将匹配
111111


提议的解决办法 对于您的问题,更好的解决方案是将同一票证的条目分组在一起,然后对增量求和。非常适合您的任务。在这里,我冒昧地重写了一些代码:

在这里,我修改了构造函数以接受字典本身。这样以后传递参数就不那么麻烦了。我还删除了添加增量的方法,稍后我们将了解原因

import csv
import itertools
from datetime import *

class Time_Entry(object):

    def __init__(self, entry):
        self.ticket_no = entry['Ticket #']
        self.time_entry_day = entry['Time Entry Day']
        self.opened = datetime.strptime(entry['Opened'], '%Y-%m-%d %H:%M:%S.%f')
        self.closed = datetime.strptime(entry['Closed'], '%Y-%m-%d %H:%M:%S.%f')
        self.start = datetime.strptime(entry['Start'], '%Y-%m-%d %H:%M:%S.%f')
        self.end = datetime.strptime(entry['End'], '%Y-%m-%d %H:%M:%S.%f')
        self.total_open_close_delta = (self.closed - self.opened).seconds
        self.total_start_end_delta = (self.end - self.start).seconds


    def display(self):
        print('Ticket #: %7.7s Start: %-15s End: %-15s Delta: %-10s' % (self.ticket_no, self.start.time(), self.end.time(), self.total_start_end_delta))
在这里,我们使用加载数据,最终输出将是
Time\u条目列表
对象:

with open('metrics.csv') as ticket_list:
    time_entry_list = [Time_Entry(line) for line in csv.DictReader(ticket_list)]

print(time_entry_list)
# [<Time_Entry object at 0x101142f60>, <Time_Entry object at 0x10114d048>, <Time_Entry object at 0x1011fddd8>, ... ]
ticket
中的最终结果是一个列表元组,其中ticket id位于第一个位置,关联的
Time\u条目列表位于最后一个位置:

print(tickets)
# [('737385', [<Time_Entry object at 0x101142f60>]),
#  ('737318', [<Time_Entry object at 0x10114d048>]),
#  ('737238', [<Time_Entry object at 0x1011fdd68>, <Time_Entry object at 0x1011fde80>]),
#  ...]
输出:

ticket: 736964
Delta open / close: 17012
Delta start / end: 420
(found 1 occurrences)

ticket: 737197
Delta open / close: 18715
Delta start / end: 840
(found 1 occurrences)

ticket: 737220
Delta open / close: 7980
Delta start / end: 360
(found 1 occurrences)

ticket: 737238
Delta open / close: 34718
Delta start / end: 540
(found 2 occurrences)

ticket: 737261
Delta open / close: 9992
Delta start / end: 600
(found 1 occurrences)

ticket: 737273
Delta open / close: 9223
Delta start / end: 660
(found 1 occurrences)

ticket: 737296
Delta open / close: 6957
Delta start / end: 240
(found 1 occurrences)

ticket: 737318
Delta open / close: 8129
Delta start / end: 1860
(found 1 occurrences)

ticket: 737385
Delta open / close: 10401
Delta start / end: 2340
(found 1 occurrences)

在故事的结尾:列表理解可能非常有用,它们允许您使用超级紧凑的语法完成许多工作。此外,python标准库还包含许多随时可用的工具,这些工具可以真正帮助您,所以请熟悉

Batsu这是一个绝对优秀的答案,我感谢你如此详细地解释一切!我可能要到星期一才能重温这件事;一旦我厌倦了你的建议并验证了它,我会相信你的答案。多谢!我更新了解决方案,我忘记了按ticketid排序(因为groupby需要对列表进行排序才能正常工作)将
tickets=[(id,[t代表tickets中的t])作为id,ticket中的tickets\u grps]
作为生成器(即
tickets=((id,[t代表tickets中的t])作为id,ticket中的tickets\u grps)
?对于大型数据集,它可能更有效。无论如何,请投票支持你的工作。当然,这很有意义。接得好!Batsu这是一个绝对优秀的答案,我感谢你如此详细地解释一切!我可能要到星期一才能重温这件事;一旦我厌倦了你的建议并验证了它,我会相信你的答案。多谢!我更新了解决方案,我忘记了按ticketid排序(因为groupby需要对列表进行排序才能正常工作)将
tickets=[(id,[t代表tickets中的t])作为id,ticket中的tickets\u grps]
作为生成器(即
tickets=((id,[t代表tickets中的t])作为id,ticket中的tickets\u grps)
?对于大型企业来说,这可能更有效