Parsing 如何以Pythonic方式进行IIS日志解析？_Parsing_Iis_Python

Parsing 如何以Pythonic方式进行IIS日志解析？

parsing iis python

Parsing 如何以Pythonic方式进行IIS日志解析？,parsing,iis,python,Parsing,Iis,Python,好的，我有一些IIS日志，我想用Python解析这些日志（我对atm相当陌生）。IIS日志的示例如下所示： #Software: Microsoft Internet Information Server 6.0 #Version: 1.0 #Date: 1998-11-19 22:48:39 #Fields: date time c-ip cs-username s-ip cs-method cs-uri-stem cs-uri-query sc-status sc-bytes cs-b

好的，我有一些IIS日志，我想用Python解析这些日志（我对atm相当陌生）。IIS日志的示例如下所示：

#Software: Microsoft Internet Information Server 6.0 
#Version: 1.0 
#Date: 1998-11-19 22:48:39 
#Fields: date time c-ip cs-username s-ip cs-method cs-uri-stem cs-uri-query sc-status sc-bytes cs-bytes time-taken cs-version cs(User-Agent) cs(Cookie) cs(Referrer) 

1998-11-19 22:48:39 206.175.82.5 - 208.201.133.173 GET /global/images/navlineboards.gif - 200 540 324 157 HTTP/1.0 Mozilla/4.0+(compatible;+MSIE+4.01;+Windows+95) USERID=CustomerA;+IMPID=01234 http://www.loganalyzer.net
1998-11-20 22:55:39 206.175.82.8 - 208.201.133.173 GET /global/something.pdf - 200 540 324 157 HTTP/1.0 Mozilla/4.0+(compatible;+MSIE+4.01;+Windows+95) USERID=CustomerA;+IMPID=01234 http://www.loganalyzer.net

这里只有两行日志数据，每个日志有数千行。。所以，这只是一个简短的例子

从这些日志中，我想提取一些数据，比如-连接最多的客户端IP地址数、下载最多的文件数、访问最多的URI数等等。。。基本上我想要的是得到一些统计数据。。。例如，因此，我希望看到如下内容：

file download_count
example1.pdf 9
example2.pdf 6
example3.doc 2

或

我不确定的是如何以一种类似蟒蛇的方式来处理这个问题。起初，我想我会将日志的每一行拆分成一个列表，然后将每一行附加到一个更大的列表中（我将其视为二维数组）。然后我进入了从那个大列表中提取统计数据的阶段，现在我想最好是用所有的数据制作一个字典，并通过dict键和dict值来计算数据？这比使用列表更好吗？如果我应该更好地使用列表，我应该如何处理它？我在谷歌上搜索什么，我在寻找什么

因此，我正在寻找通常应该如何做到这一点的想法。谢谢。

假设

跳过头（文件）

只返回文件中的日志行，并且

解析（行）

从行中提取

（ip，路径）

：

from collections import defaultdict
first = defaultdict(int)
second = defaultdict(lambda: defaultdict(int))
for line in skip_header(file):
    ip, path = parse(line)
    first[path] += 1
    second[ip][path] += 1

第一次

print "path count"
for path, count in first.iteritems():
    print "%s %d" % (path, count)

第二：

print "ip path count"
for ip,d in second.iteritems():
     for path, count in d.iteritems():
         print "%s %s %d" % (ip, path, count)

假设

skip_header（file）

仅返回文件中的日志行，并且

parse（line）

从行中提取

（ip，path）

：

from collections import defaultdict
first = defaultdict(int)
second = defaultdict(lambda: defaultdict(int))
for line in skip_header(file):
    ip, path = parse(line)
    first[path] += 1
    second[ip][path] += 1

第一次

print "path count"
for path, count in first.iteritems():
    print "%s %d" % (path, count)

第二：

print "ip path count"
for ip,d in second.iteritems():
     for path, count in d.iteritems():
         print "%s %s %d" % (ip, path, count)

google“python IIS parser”并查看前2个匹配项（第3个是您的问题）google“python IIS parser”并查看前2个匹配项（第3个是您的问题）谢谢Dan。顺便说一句，我使用了python3，所以如果有人尝试使用它，你需要使用items（）而不是iteritems（），当然还有print（）。谢谢Dan。顺便说一句，我使用了python3，所以如果有人尝试使用它，您需要使用items（）而不是iteritems（），当然还有print（）。