Python机械化-登录_Python_Mechanize

Python机械化-登录

python

Python机械化-登录,python,mechanize,Python,Mechanize,我正在尝试登录到一个网站并从中获取数据。我似乎无法让mechanize在以下站点上工作。我已经提供了下面的HTML。有人能帮我简单介绍一下如何登录和打印下一页吗我尝试过使用mechanize和循环br.forms（）。我可以从中看到表单，但我在输入用户名和密码，然后点击提交时遇到问题 <div class="loginform" id="loginpage" style="width: 300px;"> <div class="loginformentries" style=

我正在尝试登录到一个网站并从中获取数据。我似乎无法让mechanize在以下站点上工作。我已经提供了下面的HTML。有人能帮我简单介绍一下如何登录和打印下一页吗

我尝试过使用mechanize和循环br.forms（）。我可以从中看到表单，但我在输入用户名和密码，然后点击提交时遇到问题

<div class="loginform" id="loginpage" style="width: 300px;">
<div class="loginformentries" style="overflow: hidden;">
<div class="clearfix">
<div class="loginformtitle">Sign-in to your account</div>
</div>
<div class="clearfix">
<div class="loginformlabel"><label for="USERID">Username:</label></div>
<div class="loginforminput"><input name="USERID" id="USERID" style="width: 150px;" type="text" value=""></div>
</div>
<div class="clearfix">
<div class="loginformlabel"><label for="PASSWDTXT">Password:</label></div>
<div class="loginforminput"><input name="PASSWDTXT" id="PASSWDTXT" style="width: 150px;" type="password" value=""></div>
</div>
<div class="clearfix">
<div class="loginformlabel"><label for="usertype">Select Role:</label></div>
<div class="loginforminput"><select name="usertype" id="usertype" style="width: 150px;"><option value="participant">Participant</option>
<option value="sponsor">Sponsor</option></select></div>
</div>
<div class="loginformsubmit" style="text-align: right;"><span class="button"><button class="buttoninsidebuttonclass" type="submit">Login</button></span></div>
</div>
<div class="loginformdescription">Both entries are case sensitive. If you fail to login <strong>five</strong> consecutive times your account could be disabled.</div>
</div>
</div>
</div>

但是我不知道如何验证我是否在下一页

所有那些

div

都应该包装在

表单

元素中。查找并找到

名称

标记。这是您要登录的表单。然后，您可以使用下面的代码片段获取您将用于进一步浏览的cookie

import cookielib 
import urllib2 
import mechanize 

# Browser 
br = mechanize.Browser() 

# Enable cookie support for urllib2 
cookiejar = cookielib.LWPCookieJar() 
br.set_cookiejar( cookiejar ) 

# Broser options 
br.set_handle_equiv( True ) 
br.set_handle_gzip( True ) 
br.set_handle_redirect( True ) 
br.set_handle_referer( True ) 
br.set_handle_robots( False ) 

# ?? 
br.set_handle_refresh( mechanize._http.HTTPRefreshProcessor(), max_time = 1 ) 

br.addheaders = [ ( 'User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1' ) ] 

# authenticate 
br.open( the/page/you/want/to/login ) 
br.select_form( name="the name of the form from above" ) 
# these two come from the code you posted
# where you would normally put in your username and password
br[ "USERID" ] = yourLogin
br[ "PASSWDTXT" ] = yourPassword
res = br.submit() 

print "Success!\n"

之后，您的登录cookie将保存在

cookiejar

中。然后，您可以使用相同的

br

对象来获取您喜欢的任何页面

url = br.open( page/needed/after/login ) 
returnPage = url.read()

这将为您提供该页面的HTML源代码，然后您可以以任何方式解析该源代码。

这似乎很有效，您能告诉我获得下一页scarp的最佳方法吗。如果我只是想知道它的所有信息？@Dumbkid\u正在查看编辑。另外，我听说beautifulsoup可以更容易地解析HTML元素，但我从未使用过它，不确定它是否可以与MechanizeMetime结合使用，页面上的单个表单可能没有名称。在这种情况下，

br.form=list（br.forms（））[0]

将代替

br。选择表单（name=“上面表单的名称”）

url = br.open( page/needed/after/login ) 
returnPage = url.read()