用Python解析HTTP请求授权头_Python_Http_Google App Engine_Parsing_Http Headers

用Python解析HTTP请求授权头

python http google-app-engine parsing

用Python解析HTTP请求授权头,python,http,google-app-engine,parsing,http-headers,Python,Http,Google App Engine,Parsing,Http Headers,我需要这样一个标题： Authorization: Digest qop="chap", realm="testrealm@host.com", username="Foobear", response="6629fae49393a05397450978507c4ef1", cnonce="5ccc069c403ebaf9f0171e9517f40e41" 并使用Python将其解析为： {'protocol':'Digest', 'qop':'c

我需要这样一个标题：

 Authorization: Digest qop="chap",
     realm="testrealm@host.com",
     username="Foobear",
     response="6629fae49393a05397450978507c4ef1",
     cnonce="5ccc069c403ebaf9f0171e9517f40e41"

并使用Python将其解析为：

{'protocol':'Digest',
  'qop':'chap',
  'realm':'testrealm@host.com',
  'username':'Foobear',
  'response':'6629fae49393a05397450978507c4ef1',
  'cnonce':'5ccc069c403ebaf9f0171e9517f40e41'}

有没有图书馆可以这样做，或者我可以从中寻找灵感

我在Google App Engine上做这项工作，我不确定Pyparsing库是否可用，但如果它是最好的解决方案，也许我可以将它包含在我的应用程序中

目前，我正在创建自己的MyHeaderParser对象，并将其与头字符串上的reduce（）一起使用。它在工作，但非常脆弱

nadia的卓越解决方案如下：

import re

reg = re.compile('(\w+)[=] ?"?(\w+)"?')

s = """Digest
realm="stackoverflow.com", username="kixx"
"""

print str(dict(reg.findall(s)))

如果这些组件总是在那里，那么正则表达式就会起作用：

test = '''Authorization: Digest qop="chap", realm="testrealm@host.com", username="Foobear", response="6629fae49393a05397450978507c4ef1", cnonce="5ccc069c403ebaf9f0171e9517f40e41"'''

import re

re_auth = re.compile(r"""
    Authorization:\s*(?P<protocol>[^ ]+)\s+
    qop="(?P<qop>[^"]+)",\s+
    realm="(?P<realm>[^"]+)",\s+
    username="(?P<username>[^"]+)",\s+
    response="(?P<response>[^"]+)",\s+
    cnonce="(?P<cnonce>[^"]+)"
    """, re.VERBOSE)

m = re_auth.match(test)
print m.groupdict()

我建议您找到一个正确的库来解析http头，但遗憾的是，我无法访问所有库：(

暂时检查一下下面的代码段（它应该可以正常工作）：

如果您的响应以一个字符串的形式出现，并且从不变化，并且具有与表达式匹配的行数，则您可以在名为

authentication\u array

的换行符中将其拆分为一个数组，并使用regexps：

pattern_array = ['qop', 'realm', 'username', 'response', 'cnonce']
i = 0
parsed_dict = {}

for line in authentication_array:
    pattern = "(" + pattern_array[i] + ")" + "=(\".*\")" # build a matching pattern
    match = re.search(re.compile(pattern), line)         # make the match
    if match:
        parsed_dict[match.group(1)] = match.group(2)
    i += 1

一点正则表达式：

import re
reg=re.compile('(\w+)[:=] ?"?(\w+)"?')

>>>dict(reg.findall(headers))

{'username': 'Foobear', 'realm': 'testrealm', 'qop': 'chap', 'cnonce': '5ccc069c403ebaf9f0171e9517f40e41', 'response': '6629fae49393a05397450978507c4ef1', 'Authorization': 'Digest'}

您最初使用PyParsing的概念将是最好的方法。您隐式要求的是需要语法的东西……也就是说，正则表达式或简单的解析例程总是很脆弱，这听起来像是您试图避免的东西

在google app engine上获取pyparsing似乎很容易：

因此，我会这样做，然后从rfc2617实现完整的HTTP身份验证/授权头支持。

您也可以像[CheryPy][1]一样使用urllib2

以下是片段：

input= """
 Authorization: Digest qop="chap",
     realm="testrealm@host.com",
     username="Foobear",
     response="6629fae49393a05397450978507c4ef1",
     cnonce="5ccc069c403ebaf9f0171e9517f40e41"
"""
import urllib2
field, sep, value = input.partition("Authorization: Digest ")
if value:
    items = urllib2.parse_http_list(value)
    opts = urllib2.parse_keqv_list(items)
    opts['protocol'] = 'Digest'
    print opts

它输出：

{'username': 'Foobear', 'protocol': 'Digest', 'qop': 'chap', 'cnonce': '5ccc069c403ebaf9f0171e9517f40e41', 'realm': 'testrealm@host.com', 'response': '6629fae49393a05397450978507c4ef1'}

[1] ：Digest http lang:python

以下是我的py解析尝试：

text = """Authorization: Digest qop="chap",
    realm="testrealm@host.com",     
    username="Foobear",     
    response="6629fae49393a05397450978507c4ef1",     
    cnonce="5ccc069c403ebaf9f0171e9517f40e41" """

from pyparsing import *

AUTH = Keyword("Authorization")
ident = Word(alphas,alphanums)
EQ = Suppress("=")
quotedString.setParseAction(removeQuotes)

valueDict = Dict(delimitedList(Group(ident + EQ + quotedString)))
authentry = AUTH + ":" + ident("protocol") + valueDict

print authentry.parseString(text).dump()

其中打印：

['Authorization', ':', 'Digest', ['qop', 'chap'], ['realm', 'testrealm@host.com'],
 ['username', 'Foobear'], ['response', '6629fae49393a05397450978507c4ef1'], 
 ['cnonce', '5ccc069c403ebaf9f0171e9517f40e41']]
- cnonce: 5ccc069c403ebaf9f0171e9517f40e41
- protocol: Digest
- qop: chap
- realm: testrealm@host.com
- response: 6629fae49393a05397450978507c4ef1
- username: Foobear

我不熟悉RFC，但我希望这能让你开动脑筋。

http摘要授权标头字段有点奇怪。它的格式类似于的缓存控制和内容类型标头字段，但不同程度足以导致不兼容。如果你仍在寻找一个更聪明、更具可读性的库，请除了正则表达式之外，您还可以尝试删除Authorization:Digest部分，并使用from的http模块解析其余部分。（Werkzeug可以安装在App Engine上。）

Nadia的正则表达式只匹配参数值的字母数字字符。这意味着它无法解析至少两个字段。即uri和qop。根据RFC 2617，uri字段是请求行（即HTTP请求的第一行）中字符串的副本。如果值为“auth int”由于非字母数字“-”

此修改后的正则表达式允许URI（或任何其他值）包含除“”（空格）、“”（qoute）或“，”（逗号）以外的任何内容。这可能比它需要的权限更大，但不会对格式正确的HTTP请求造成任何问题

reg re.compile('(\w+)[:=] ?"?([^" ,]+)"?')

额外提示：从这里开始，将RFC-2617中的示例代码转换为python非常简单。使用python的MD5API，“MD5Init（）”变成“m=md5.new（）”，“MD5Update（）”变成“m.update（）”，“MD5Final（）”变成“m.digest（）”

这是一个老问题，但我觉得非常有用

我需要一个解析器来处理任何格式正确的授权头，如（如果您喜欢阅读ABNF，请举手）

这允许解析任何授权标头：

parsed = auth_parser.parseString('Authorization: Basic Zm9vOmJhcg==')
print('Authenticating with {0} scheme, token: {1}'.format(parsed['scheme'], parsed['token']))

哪些产出：

Authenticating with Basic scheme, token: Zm9vOmJhcg==

Authenticating using Digest scheme
- username is Foobar
- cnonce is 5ccc069c403ebaf9f0171e9517f40e41

Authenticating using Bearer scheme
- token is cn389ncoiwuencr

Authenticating using Basic scheme
- token is Zm9vOmJhcg==
- username is foo
- password is bar

Authenticating using AWS4-HMAC-SHA256 scheme
- signature is fe5f80f77d5fa3beca038a248ff027d0445342fe2855ddc963176630326f1024

Authenticating using CrazyCustom scheme 
This is a valid Authorization header, but we do not handle this scheme yet.

将所有内容合并到一个

验证器

类中：

import pyparsing as pp
from base64 import b64decode
import re

class Authenticator:
    def __init__(self):
        """
        Use pyparsing to create a parser for Authentication headers
        """
        tchar = "!#$%&'*+-.^_`|~" + pp.nums + pp.alphas
        t68char = '-._~+/' + pp.nums + pp.alphas

        token = pp.Word(tchar)
        token68 = pp.Combine(pp.Word(t68char) + pp.ZeroOrMore('='))

        scheme = token('scheme')

        auth_header = pp.Keyword('Authorization')
        name = pp.Word(pp.alphas, pp.alphanums)
        value = pp.quotedString.setParseAction(pp.removeQuotes)
        name_value_pair = name + pp.Suppress('=') + value
        params = pp.Dict(pp.delimitedList(pp.Group(name_value_pair)))

        credentials = scheme + (token68('token') ^ params('params'))

        # the moment of truth...
        self.auth_parser = auth_header + pp.Suppress(':') + credentials


    def authenticate(self, auth_header):
        """
        Parse auth_header and call the correct authentication handler
        """
        authenticated = False
        try:
            parsed = self.auth_parser.parseString(auth_header)
            scheme = parsed['scheme']
            details = parsed['token'] if 'token' in parsed.keys() else parsed['params']

            print('Authenticating using {0} scheme'.format(scheme))
            try:
                safe_scheme = re.sub("[!#$%&'*+-.^_`|~]", '_', scheme.lower())
                handler = getattr(self, 'auth_handle_' + safe_scheme)
                authenticated = handler(details)
            except AttributeError:
                print('This is a valid Authorization header, but we do not handle this scheme yet.')

        except pp.ParseException as ex:
            print('Not a valid Authorization header')
            print(ex)

        return authenticated


    # The following methods are fake, of course.  They should use what's passed
    # to them to actually authenticate, and return True/False if successful.
    # For this demo I'll just print some of the values used to authenticate.
    @staticmethod
    def auth_handle_basic(token):
        print('- token is {0}'.format(token))
        try:
            username, password = b64decode(token).decode().split(':', 1)
        except Exception:
            raise DecodeError
        print('- username is {0}'.format(username))
        print('- password is {0}'.format(password))
        return True

    @staticmethod
    def auth_handle_bearer(token):
        print('- token is {0}'.format(token))
        return True

    @staticmethod
    def auth_handle_digest(params):
        print('- username is {0}'.format(params['username']))
        print('- cnonce is {0}'.format(params['cnonce']))
        return True

    @staticmethod
    def auth_handle_aws4_hmac_sha256(params):
        print('- Signature is {0}'.format(params['Signature']))
        return True

要测试此类，请执行以下操作：

tests = [
    'Authorization: Digest qop="chap", realm="testrealm@example.com", username="Foobar", response="6629fae49393a05397450978507c4ef1", cnonce="5ccc069c403ebaf9f0171e9517f40e41"',
    'Authorization: Bearer cn389ncoiwuencr',
    'Authorization: Basic Zm9vOmJhcg==',
    'Authorization: AWS4-HMAC-SHA256 Credential="AKIAIOSFODNN7EXAMPLE/20130524/us-east-1/s3/aws4_request", SignedHeaders="host;range;x-amz-date", Signature="fe5f80f77d5fa3beca038a248ff027d0445342fe2855ddc963176630326f1024"',
    'Authorization: CrazyCustom foo="bar", fizz="buzz"',
]

authenticator = Authenticator()

for test in tests:
    authenticator.authenticate(test)
    print()

哪些产出：

Authenticating with Basic scheme, token: Zm9vOmJhcg==

Authenticating using Digest scheme
- username is Foobar
- cnonce is 5ccc069c403ebaf9f0171e9517f40e41

Authenticating using Bearer scheme
- token is cn389ncoiwuencr

Authenticating using Basic scheme
- token is Zm9vOmJhcg==
- username is foo
- password is bar

Authenticating using AWS4-HMAC-SHA256 scheme
- signature is fe5f80f77d5fa3beca038a248ff027d0445342fe2855ddc963176630326f1024

Authenticating using CrazyCustom scheme 
This is a valid Authorization header, but we do not handle this scheme yet.

将来，如果我们想处理疯狂定制，我们只需添加

def auth_handle_crazycustom(params):

哇，我喜欢Python。“授权：”实际上不是标题字符串的一部分，所以我改为：#！/usr/bin/env Python import re def mymain（）：reg=re.compile（“（\w+[=]”？（\w+）？”）s=“””摘要域=“fireworksproject.com”，username=“kristofer”“打印str（dict（reg.findall）），如果name=”main:“mymain”（）我没有得到“摘要”协议声明，但我无论如何也不需要它。基本上有3行代码…太棒了！！！我认为使用原始字符串或\\会更明确。如果您找到并使用它，请确保在

“？（\w+）”

中添加另一个问号，这样它就变成了

“？（\w+）？”

如果您将某个内容作为“”传递，则会以这种方式返回参数，且值未定义。如果您确实想要摘要：

/（\w+）（[：=]）？“（\w+）？”？/

检查匹配中是否存在

，如果存在，则为键：值，否则为其他内容。实际上，

“

不是必需的（例如，

算法

通常不会用

“

”来限定其值，并且值本身可以包含转义的

“

”？有点风险=）（）更宽容的版本：

重新编译（r'（\w+）[：=[\s”]？（[^，]+）“？）

我决定采用这种方法，并尝试使用RFC规范为授权标头实现一个完全兼容的解析器。这项任务似乎比我预想的要艰巨得多。您选择的简单正则表达式虽然不完全正确，但可能是最好的实用解决方案。如果最终得到一个全功能的标题解析器。是的，如果能看到更准确的内容，那就太好了。嗨，Jason-如果你还在看，请看我的答案。PyParsing非常棒！到目前为止，这个解决方案已经被证明是超级干净的，但也非常健壮。虽然不是书上说的最棒的"在RFC的实现中，我还没有构建一个返回无效值的测试用例。但是，我只使用它来解析授权头，而我感兴趣的其他头的nonce需要解析，因此这可能不是一个通用HTTP头解析器的好解决方案。我来这里寻找一个成熟的RFC-ified解析器。您的问题在上，@PaulMcG的答案让我走上了正确的道路（见下面我的答案）。谢谢你们！这个解决方案产生了正确的结果

Authenticating using Digest scheme
- username is Foobar
- cnonce is 5ccc069c403ebaf9f0171e9517f40e41

Authenticating using Bearer scheme
- token is cn389ncoiwuencr

Authenticating using Basic scheme
- token is Zm9vOmJhcg==
- username is foo
- password is bar

Authenticating using AWS4-HMAC-SHA256 scheme
- signature is fe5f80f77d5fa3beca038a248ff027d0445342fe2855ddc963176630326f1024

Authenticating using CrazyCustom scheme 
This is a valid Authorization header, but we do not handle this scheme yet.

def auth_handle_crazycustom(params):