从文本文件python中读取基于日期的数据块

从文本文件python中读取基于日期的数据块,python,json,python-3.x,file-io,Python,Json,Python 3.x,File Io,我正在从文本文档中读取HTTP api错误日志,并尝试根据日期对它们进行分组 错误日志以以下格式显示: #Fields: date time c-ip c-port s-ip s-port cs-version cs-method cs-uri sc-status s-siteid s-reason s-queuename 2017-08-04 12:45:55 11.222.33.44 40221 10.200.0.31 5985 HTTP/1.1 GET / 404 - NotFound -

我正在从文本文档中读取HTTP api错误日志,并尝试根据日期对它们进行分组

错误日志以以下格式显示:

#Fields: date time c-ip c-port s-ip s-port cs-version cs-method cs-uri sc-status s-siteid s-reason s-queuename
2017-08-04 12:45:55 11.222.33.44 40221 10.200.0.31 5985 HTTP/1.1 GET / 404 - NotFound -
2017-08-04 09:45:55 11.222.33.44 52612 10.200.0.31 5985 HTTP/1.1 GET /reviews 404 - NotFound -  
2017-08-05 01:45:55 11.222.33.44 44272 10.200.0.31 5985 HTTP/1.1 GET / 404 - NotFound -  
2017-08-05 12:45:55 11.222.33.44 36944 10.200.0.31 5985 HTTP/1.1 GET /login 404 - NotFound -  
2017-08-06 12:46:55 11.222.33.44 49104 10.200.0.31 5985 HTTP/1.1 GET / 404 - NotFound -
2017-08-06 12:45:55 11.222.33.44 47129 10.200.0.31 5985 HTTP/1.1 GET /login 404 - NotFound -
2017-08-06 05:45:55 11.222.33.44 35612 10.200.0.31 5985 HTTP/1.1 GET / 404 - NotFound -
2017-08-07 01:45:55 11.222.33.44 57208 10.200.0.31 5985 HTTP/1.1 GET /login.action 404 - NotFound -
已编写python逻辑,逐行读取错误日志文本文件,并为其创建JSON数组:

json_data=[]
with open('path to text file') as f:
    lines = f.readlines()

for line in lines:

    if line.startswith('#'):
        continue

    data = line.split()
    Date,Time,Client_IP,Client_Port, Server_IP,Server_Port ,Protocol_Version,Method, URI,HTTP_Status,SiteId,Reason_Phrase,Queue_Name= [data[i] for i in range(len(data))]

    json_record = {"Date":Date,"Time":Time,"Client_IP":Client_IP ,"Client_Port":Client_Port,  "Server_IP":Server_IP , "Server_Port":Server_Port,"Protocol_Version":Protocol_Version, "Method":Method, "URI":URI, "HTTP_Status":HTTP_Status,"SiteId":SiteId,  "Reason_Phrase":Reason_Phrase, "Queue_Name":Queue_Name}
    print(json_record)

    json_data.append(json_record)
json_数据数组包含文件的全部数据,但我需要根据日期创建数组。

如果您能为我的问题提供解决方案,那将很有帮助。

您可以在字典中按日期对数据进行分组。我根据日志文件中的日期创建密钥,并将内容附加到列表中

Ex:

logStr = """#Fields: date time c-ip c-port s-ip s-port cs-version cs-method cs-uri sc-status s-siteid s-reason s-queuename
2017-08-04 12:45:55 11.222.33.44 40221 10.200.0.31 5985 HTTP/1.1 GET / 404 - NotFound -
2017-08-04 09:45:55 11.222.33.44 52612 10.200.0.31 5985 HTTP/1.1 GET /reviews 404 - NotFound -  
2017-08-05 01:45:55 11.222.33.44 44272 10.200.0.31 5985 HTTP/1.1 GET / 404 - NotFound -  
2017-08-05 12:45:55 11.222.33.44 36944 10.200.0.31 5985 HTTP/1.1 GET /login 404 - NotFound -  
2017-08-06 12:46:55 11.222.33.44 49104 10.200.0.31 5985 HTTP/1.1 GET / 404 - NotFound -
2017-08-06 12:45:55 11.222.33.44 47129 10.200.0.31 5985 HTTP/1.1 GET /login 404 - NotFound -
2017-08-06 05:45:55 11.222.33.44 35612 10.200.0.31 5985 HTTP/1.1 GET / 404 - NotFound -
2017-08-07 01:45:55 11.222.33.44 57208 10.200.0.31 5985 HTTP/1.1 GET /login.action 404 - NotFound -"""


d = {}
for line in logStr.split("\n"):

    if not line.startswith('#'):
        data = line.split()
        Date,Time,Client_IP,Client_Port, Server_IP,Server_Port ,Protocol_Version,Method, URI,HTTP_Status,SiteId,Reason_Phrase,Queue_Name= [data[i] for i in range(len(data))]
        json_record = {"Date":Date,"Time":Time,"Client_IP":Client_IP ,"Client_Port":Client_Port,  "Server_IP":Server_IP , "Server_Port":Server_Port,"Protocol_Version":Protocol_Version, "Method":Method, "URI":URI, "HTTP_Status":HTTP_Status,"SiteId":SiteId,  "Reason_Phrase":Reason_Phrase, "Queue_Name":Queue_Name}

        if Date not in d:
            d[Date] = [json_record]
        else:
            d[Date].append(json_record)

print d
输出

{'2017-08-06': [{'Reason_Phrase': 'NotFound', 'Server_IP': '10.200.0.31', 'URI': '/', 'Client_Port': '49104', 'HTTP_Status': '404', 'Time': '12:46:55', 'Method': 'GET', 'Client_IP': '11.222.33.44', 'Queue_Name': '-', 'SiteId': '-', 'Server_Port': '5985', 'Date': '2017-08-06', 'Protocol_Version': 'HTTP/1.1'}, {'Reason_Phrase': 'NotFound', 'Server_IP': '10.200.0.31', 'URI': '/login', 'Client_Port': '47129', 'HTTP_Status': '404', 'Time': '12:45:55', 'Method': 'GET', 'Client_IP': '11.222.33.44', 'Queue_Name': '-', 'SiteId': '-', 'Server_Port': '5985', 'Date': '2017-08-06', 'Protocol_Version': 'HTTP/1.1'}, {'Reason_Phrase': 'NotFound', 'Server_IP': '10.200.0.31', 'URI': '/', 'Client_Port': '35612', 'HTTP_Status': '404', 'Time': '05:45:55', 'Method': 'GET', 'Client_IP': '11.222.33.44', 'Queue_Name': '-', 'SiteId': '-', 'Server_Port': '5985', 'Date': '2017-08-06', 'Protocol_Version': 'HTTP/1.1'}], '2017-08-07': [{'Reason_Phrase': 'NotFound', 'Server_IP': '10.200.0.31', 'URI': '/login.action', 'Client_Port': '57208', 'HTTP_Status': '404', 'Time': '01:45:55', 'Method': 'GET', 'Client_IP': '11.222.33.44', 'Queue_Name': '-', 'SiteId': '-', 'Server_Port': '5985', 'Date': '2017-08-07', 'Protocol_Version': 'HTTP/1.1'}], '2017-08-04': [{'Reason_Phrase': 'NotFound', 'Server_IP': '10.200.0.31', 'URI': '/', 'Client_Port': '40221', 'HTTP_Status': '404', 'Time': '12:45:55', 'Method': 'GET', 'Client_IP': '11.222.33.44', 'Queue_Name': '-', 'SiteId': '-', 'Server_Port': '5985', 'Date': '2017-08-04', 'Protocol_Version': 'HTTP/1.1'}, {'Reason_Phrase': 'NotFound', 'Server_IP': '10.200.0.31', 'URI': '/reviews', 'Client_Port': '52612', 'HTTP_Status': '404', 'Time': '09:45:55', 'Method': 'GET', 'Client_IP': '11.222.33.44', 'Queue_Name': '-', 'SiteId': '-', 'Server_Port': '5985', 'Date': '2017-08-04', 'Protocol_Version': 'HTTP/1.1'}], '2017-08-05': [{'Reason_Phrase': 'NotFound', 'Server_IP': '10.200.0.31', 'URI': '/', 'Client_Port': '44272', 'HTTP_Status': '404', 'Time': '01:45:55', 'Method': 'GET', 'Client_IP': '11.222.33.44', 'Queue_Name': '-', 'SiteId': '-', 'Server_Port': '5985', 'Date': '2017-08-05', 'Protocol_Version': 'HTTP/1.1'}, {'Reason_Phrase': 'NotFound', 'Server_IP': '10.200.0.31', 'URI': '/login', 'Client_Port': '36944', 'HTTP_Status': '404', 'Time': '12:45:55', 'Method': 'GET', 'Client_IP': '11.222.33.44', 'Queue_Name': '-', 'SiteId': '-', 'Server_Port': '5985', 'Date': '2017-08-05', 'Protocol_Version': 'HTTP/1.1'}]}

这正是我想要的……非常感谢:-)不客气。若答案解决了你们的问题,请接受。谢谢