当尝试使用python登录时,网站返回501

当尝试使用python登录时,网站返回501,python,http,web-scraping,screen-scraping,Python,Http,Web Scraping,Screen Scraping,我说的是这个网站: 我正在尝试按如下方式登录: def login(self, username, password): #form_doc: a lxml.html object form_doc = self.browser.getdoc("http://www.belegger.nl/mijnbelegger/voorpagina") form_html = form_doc.cssselect("div.loginPanel form")[0] form

我说的是这个网站:

我正在尝试按如下方式登录:

def login(self, username, password):
    #form_doc: a lxml.html object
    form_doc = self.browser.getdoc("http://www.belegger.nl/mijnbelegger/voorpagina")
    form_html = form_doc.cssselect("div.loginPanel form")[0]
    form_dict = {inp.get('name') : inp.get('value') for inp in form_html.cssselect("input")}
    form_dict['username'] = username
    form_dict['password'] = password

    #form_dict now contains all the correct inputs and their values

    #then, i precisely copy all the headers of a successful browser login:
    self.add_headers()        

    #what follows is a POST request:
    self.browser.open("http://www.belegger.nl/mijnbelegger/voorpagina", self.browser.urlencode(form_dict))

def add_headers(self):
    headers = {
        'Host' : 'www.belegger.nl',
        'User-Agent' : 'Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:18.0) Gecko/20100101 Firefox/18.0',
        'Accept' : 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
        'Accept-Language' : 'en-gb,en;q=0.5',
        'Accept-Encoding' : 'gzip, deflate',
        'Referer' : 'http://www.belegger.nl/mijnbelegger/profiel',
        'Content-Length' : '187',
        'Content-Type' : 'text/plain; charset=UTF-8',
        'Connection' : 'keep-alive',
        'Pragma' : 'no-cache',
        'Cache-Control' : 'no-cache'
    }
    for header in headers.items():
        self.browser.opener.addheaders.append(header)
这是最终的请求:

POST /mijnbelegger/voorpagina HTTP/1.1
Content-Length: 115
Accept-Language: en-gb,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: close
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:18.0) Gecko/20100101 Firefox/18.0
Host: www.belegger.nl
Referer: http://www.belegger.nl/mijnbelegger/profiel
Pragma: no-cache
Cache-Control: no-cache
Content-Type: application/x-www-form-urlencoded

formtoken=c5851960db739b61bc28afd4f23cec2badacb807&username=xxx&password=xxx&Inloggen=Inloggen
python中的请求与firefox中的请求几乎完全相同,唯一的区别是“connection”头,它似乎无法通过urllib2进行更改

但还有更多:

如果我尝试使用firefox插件“live http headers”重新成功登录,我会收到相同的501错误。当我使用正确的'formtoken'值时,我甚至会得到这个值


那么,这501的原因是什么呢?

我看到了两个不同的内容类型值,对吗?firebug是这么说的:上传流的请求头:内容长度:115,内容类型:应用程序/x-www-form-urlencoded我不知道“上传流”是什么意思,但我假设它是正确的是,在请求头的内容类型为text/plain时,我想知道为什么firebug会说application/x-www-urlencoded…这是什么?内容长度是多少?