如何在python中解析pcap头，同时保留头字段顺序_Python_Parsing_Http_Http Headers

如何在python中解析pcap头，同时保留头字段顺序

python parsing http

如何在python中解析pcap头，同时保留头字段顺序,python,parsing,http,http-headers,Python,Parsing,Http,Http Headers,目前，我正在解析pcap文件中的HTTP头，如下所示： f = file(sys.argv[1],"rb") # pass in pcap file as argument to script fout = open("path to header output file", "a") pcap = dpkt.pcap.Reader(f) # master holds string to write master = "" print "Working ..." for ts, buf in

目前，我正在解析pcap文件中的HTTP头，如下所示：

f = file(sys.argv[1],"rb") # pass in pcap file as argument to script
fout = open("path to header output file", "a")
pcap = dpkt.pcap.Reader(f)

# master holds string to write
master = ""
print "Working ..."
for ts, buf in pcap:
  l2 = dpkt.ethernet.Ethernet(buf)
  if l2.type == 2048: #only for IP (ip id 2048), no ARP
    l3=l2.data
    if l3.p == dpkt.ip.IP_PROTO_TCP: #IP TCP
      l4=l3.data
      if l4.dport==80 and len(l4.data)>0:
        try:
          http=dpkt.http.Request(l4.data)
          dict_headers = http.headers
          http_method = http.method 
          http_uri = http.uri
          http_body = http.body
          http_version = http.version

          # this is for first line, method + uri, e.g. GET URI
          master += unicode( http_method + ' ' +  http_uri + ' ' + 'HTTP/' +  http_version + '\n','utf-8')

          for key,val in dict_headers.iteritems():
            master += unicode( key + ': ' + val + '\n', 'utf-8')

          master += '\n'
        except:
          master += unicode( l4.data, 'utf-8')
          continue

# perform writing and closing of files, etc

问题是，dpkt将http字段存储在一个无序的字典（http.headers）中。我需要保留字段的顺序。有没有办法解决这个问题？

有两种选择：

您可以将dpkt的代码更改为使用OrderedDict而不是常规字典（没有尝试）。OrderedDict保留插入顺序

自己解析HTTP请求，每个头值以\x0d\x0a结尾。每个标题名称的末尾都有“：”，所以您可以使用split并通过以下方式创建标题列表（按顺序排列）：

l5 = l4.data
headers_and_content = l5[l5.index('\x0d\x0a')+2:l5.index('\x0d\x0a\x0d\x0a')].split('\x0d\x0a')
ordered_headers = []
for item in headers_and_content:
    ordered_headers.append(item[:item.index(':')])