Python 使用Scrapy登录到个人配置文件后无法刮取IMDB?
下面是我的代码。成功登录后,我无法刮取IMDB。问题在于after_登录验证表单请求是否有效,但当我通过发出新请求打印登录后的页面内容时,它显示的是IMDB主页面,而不是用户登录的IMDB主页面Python 使用Scrapy登录到个人配置文件后无法刮取IMDB?,python,login,web-scraping,scrapy,Python,Login,Web Scraping,Scrapy,下面是我的代码。成功登录后,我无法刮取IMDB。问题在于after_登录验证表单请求是否有效,但当我通过发出新请求打印登录后的页面内容时,它显示的是IMDB主页面,而不是用户登录的IMDB主页面 """ Attributes: name (str): essential attribute which specifies the name of the spider start_urls (list): the urls that are to be scraped """ n
"""
Attributes:
name (str): essential attribute which specifies the name of the spider
start_urls (list): the urls that are to be scraped
"""
name = 'IMDB_spider'
start_urls = ['https://www.imdb.com/ap/signin?clientContext=131-8656718-8097200&openid.pape.max_auth_age=0&openid.'
'return_to=https%3A%2F%2Fwww.imdb.com%2Fap-signin-handler&openid.identity=http%3A%2F%2Fspecs.openid.ne'
't%2Fauth%2F2.0%2Fidentifier_select&openid.assoc_handle=imdb_us&openid.mode=checkid_setup&siteState=ey'
'JvcGVuaWQuYXNzb2NfaGFuZGxlIjoiaW1kYl91cyIsInJlZGlyZWN0VG8iOiJodHRwczovL3d3dy5pbWRiLmNvbS8_cmVmXz1sb2d'
'pbiJ9&openid.claimed_id=http%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0%2Fidentifier_select&openid.ns=http'
'%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0&&tag=imdbtag_reg-20']
def parse(self, response):
"""
Scrapy's default method that handles all the downloaded response for
each request made.
Arguments:
response (text): contains all data of the page and other helpful
methods as well
"""
return scrapy.FormRequest.from_response(
response,
formdata={'username': '*******', 'password': '****'},
callback=self.after_login
)
def after_login(self, response):
"""
Default callback method that is called to authenticate when logging in
to website.
Arguments:
response (text): contains all data of the page and other helpful
methods as well
"""
if "There was a problem." in response.body:
print('Login Failed')
return
print('Login Success')
return scrapy.Request(url="http://www.imdb.com",
callback=self.parse_imdb_page)
def parse_imdb_page(self, response):
print response.body
请帮助至少删除您的用户名和密码:p您确定已成功登录吗?如果html正文中没有“有问题”文本,则会打印“登录成功”。这是登录失败时显示的消息。您对测试它是否已登录的检查非常差。在shell中测试表单请求时,我看到一个页面,要求我启用cookie。确保启用了Cookie,并将重点放在验证您是否登录的页面上,而不是反过来。我已启用Cookie