Python 如何实现从mechanize到pycurl的登录切换_Python_Authentication_Login_Mechanize_Pycurl

Python 如何实现从mechanize到pycurl的登录切换

python authentication login

Python 如何实现从mechanize到pycurl的登录切换,python,authentication,login,mechanize,pycurl,Python,Authentication,Login,Mechanize,Pycurl,我需要使用python中的mechanize登录到一个网站，然后继续使用pycurl遍历该网站。所以我需要知道的是如何将通过mechanize建立的登录状态转移到pycurl中。我想这不仅仅是复制饼干。还是这样？代码示例是有价值的；）我不愿意单独使用pycurl的原因：我有时间限制，我的mechanize代码在修改示例5分钟后工作，如下所示： import mechanize import cookielib # Browser br = mechanize.Browser() # Co

我需要使用python中的mechanize登录到一个网站，然后继续使用pycurl遍历该网站。所以我需要知道的是如何将通过mechanize建立的登录状态转移到pycurl中。我想这不仅仅是复制饼干。还是这样？代码示例是有价值的；）

我不愿意单独使用pycurl的原因： 我有时间限制，我的mechanize代码在修改示例5分钟后工作，如下所示：

import mechanize
import cookielib

# Browser
br = mechanize.Browser()

# Cookie Jar
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)

# Browser options
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)

# Follows refresh 0 but not hangs on refresh > 0
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
# debugging messages?
#br.set_debug_http(True)
#br.set_debug_redirects(True)
#br.set_debug_responses(True)

# User-Agent (this is cheating, ok?)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]

# Open the site
r = br.open('https://thewebsite.com')
html = r.read()

# Show the source
print html
# or
print br.response().read()

# Show the html title
print br.title()

# Show the response headers
print r.info()
# or
print br.response().info()

# Show the available forms
for f in br.forms():
    print f

# Select the first (index zero) form
br.select_form(nr=0)

# Let's search
br.form['username']='someusername'
br.form['password']='somepwd'
br.submit()

print br.response().read()

# Looking at some results in link format
for l in br.links(url_regex='\.com'):
    print l

现在，如果我能将正确的信息从br对象传输到pycurl，我就完成了

我不愿意单独使用mechanize的原因： Mechanize基于urllib，urllib是一场噩梦。我有太多的精神创伤问题。我可以吞下一两个电话，以便登录，但请不要更多。相比之下，pycurl已经证明了它的稳定性、可定制性和快速性。根据我的经验，pycurl到urllib就像《星际迷航》到《燧石》

PS：如果有人想知道的话，一旦我解决了html问题，我就会使用BeautifulSoup。显然，这都是关于饼干的。以下是获取cookie的代码：

import cookielib
import mechanize

def getNewLoginCookieFromSomeWebsite(username = 'someusername', pwd = 'somepwd'):
    """
    returns a login cookie for somewebsite.com by using mechanize
    """
    # Browser
    br = mechanize.Browser()

    # Cookie Jar
    cj = cookielib.LWPCookieJar()
    br.set_cookiejar(cj)

    # Browser options
    br.set_handle_equiv(True)
    br.set_handle_gzip(True)
    br.set_handle_redirect(True)
    br.set_handle_referer(True)
    br.set_handle_robots(False)

    # Follows refresh 0 but does not hang on refresh > 0
    br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)

    # User-Agent
    br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:26.0) Gecko/20100101 Firefox/26.0')]

    # Open login site
    response = br.open('https://www.somewebsite.com')

    # Select the first (index zero) form
    br.select_form(nr=0)

    # Enter credentials
    br.form['user']=username
    br.form['password']=pwd
    br.submit()

    cookiestr = ""
    for c in br._ua_handlers['_cookies'].cookiejar:
        cookiestr+=c.name+'='+c.value+';'

    return cookiestr

为了在使用pycurl时激活该cookie的使用，您只需在

c.perform（）

发生之前键入以下内容：

c.setopt(pycurl.COOKIE, getNewLoginCookieFromSomeWebsite("username", "pwd"))

请记住：有些网站可能会通过

设置内容与cookie进行交互，而pycurl（与mechanize不同）不会自动对cookie执行任何操作。Pycurl只接收字符串并将如何处理留给用户