Web scraping Scrapy FormRequest登录不工作
我正在尝试使用Scrapy登录,但收到了很多“重定向(302)”消息。这种情况发生在我使用真实登录时,也使用假登录信息时。我还尝试了另一个网站,但仍然没有运气Web scraping Scrapy FormRequest登录不工作,web-scraping,scrapy,Web Scraping,Scrapy,我正在尝试使用Scrapy登录,但收到了很多“重定向(302)”消息。这种情况发生在我使用真实登录时,也使用假登录信息时。我还尝试了另一个网站,但仍然没有运气 import scrapy from scrapy.http import FormRequest, Request class LoginSpider(scrapy.Spider): name = 'SOlogin' allowed_domains = ['stackoverflow.com'] login_
import scrapy
from scrapy.http import FormRequest, Request
class LoginSpider(scrapy.Spider):
name = 'SOlogin'
allowed_domains = ['stackoverflow.com']
login_url = 'https://stackoverflow.com/users/login?ssrc=head&returnurl=http%3a%2f%2fstackoverflow.com%2f'
test_url = 'http://stackoverflow.com/questions/ask'
def start_requests(self):
yield Request(url=self.login_url, callback=self.parse_login)
def parse_login(self, response):
return FormRequest.from_response(response, formdata={"email": "XXXXX", "password": "XXXXX"}, callback=self.start_crawl)
def start_crawl(self, response):
yield Request(self.test_url, callback=self.parse_item)
def parse_item(self, response):
print("Test URL " + response.url)
我还尝试添加
meta = {'dont_redirect': True, 'handle_httpstatus_list':[302]}
到初始请求和FormRequest
下面是上面代码的输出:
2017-04-17 21:48:17[scrapy.utils.log]信息:scrapy 1.3.3已启动
(bot:stackoverflow)2017-04-17 21:48:17[刮伤的痕迹]信息:
覆盖的设置:{'BOT_NAME':'stackoverflow','NEWSPIDER_MODULE':
“stackoverflow.SPIDER”,“SPIDER_模块”:[“stackoverflow.SPIDER'],
“用户代理”:“Mozilla/5.0”}2017-04-17 21:48:17[剪贴簿中间件]
信息:启用的扩展:['scrapy.extensions.corestats.corestats',
'scrapy.extensions.telnet.TelnetConsole',
"scrapy.extensions.logstats.logstats"2017-04-17 21:48:17
[scrapy.middleware]信息:启用的下载程序中间件:
['scrapy.downloaderMiddleware.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddleware.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloaderMiddleware.defaultheaders.DefaultHeadersMiddleware',
'scrapy.DownloaderMiddleware.useragent.UserAgentMiddleware',
'scrapy.DownloaderMiddleware.retry.RetryMiddleware',
'scrapy.DownloaderMiddleware.redirect.MetaRefreshMiddleware',
'scrapy.DownloaderMiddleware.httpcompression.HttpCompressionMiddleware',
'scrapy.DownloaderMiddleware.redirect.RedirectMiddleware',
“scrapy.DownloaderMiddleware.cookies.CookiesMiddleware”,
“scrapy.downloaderMiddleware.stats.DownloaderStats”]2017-04-17
21:48:17[scrapy.middleware]信息:启用的蜘蛛中间件:
['scrapy.spidermiddleware.httperror.httperror中间件',
'刮皮.SpiderMiddleware.场外.场外Iddleware',
“scrapy.Spidermiddleware.referer.RefererMiddleware”,
'scrapy.spiderMiddleware.urllength.UrlLengthMiddleware',
"刮皮.Spidermiddleware.深度.深度middleware"2017-04-17 21:48:17
[scrapy.middleware]信息:启用的项目管道:[]2017-04-17
21:48:17[刮屑.核心.引擎]信息:蜘蛛网打开2017-04-17 21:48:17
[scrapy.extensions.logstats]信息:已爬网0页(0页/分钟),
擦伤0件(以0件/分钟的速度)2017-04-17 21:48:17
[scrapy.extensions.telnet]调试:telnet控制台正在侦听
127.0.0.1:6023 2017-04-17 21:48:18[scrapy.core.engine]调试:爬网(200)https://stackoverflow.com/users/login?ssrc=head&returnurl=http%3a%2f%2fstackoverflow.com%2f>
(参考:无)2017-04-17 21:48:18[刮屑核心引擎]调试:
爬网(200)https://stackoverflow.com/search?q=&email=XXXXX&password=XXXXX>
(参考:
)
2017-04-17 21:48:19[scrapy.DownloaderMiddleware.redirect]调试:
重定向(302)到http://stackoverflow.com/users/login?ssrc=anon_ask&returnurl=http%3a%2f%2fstackoverflow.com%2fquestions%2fask>
从…起http://stackoverflow.com/questions/ask> 2017-04-17 21:48:19
[scrapy.DownloaderMiddleware.redirect]调试:将(302)重定向到
https://stackoverflow.com/users/login?ssrc=anon_ask&returnurl=http%3a%2f%2fstackoverflow.com%2fquestions%2fask>
从…起http://stackoverflow.com/users/login?ssrc=anon_ask&returnurl=http%3a%2f%2fstackoverflow.com%2fquestions%2fask>
2017-04-17 21:48:19[刮屑核心引擎]调试:爬网(200)https://stackoverflow.com/users/login?ssrc=anon_ask&returnurl=http%3a%2f%2fstackoverflow.com%2fquestions%2fask>
(参考:
)试验
统一资源定位地址
2017-04-17 21:48:19[刮屑芯发动机]信息:关闭卡盘
(已完成)2017-04-17 21:48:19[垃圾收集站]信息:倾倒
碎片统计:{'downloader/request_bytes':1772,
“下载程序/请求计数”:5,“下载程序/请求计数/获取方法”:
5,“下载程序/响应_字节”:34543,“下载程序/响应_计数”:
5,“下载器/响应状态/计数/200”:3,
“下载程序/响应状态\计数/302”:2,“完成原因”:
“完成”、“完成时间”:datetime.datetime(2017、4、17、18、48、19、,
470354),“日志计数/调试”:6,“日志计数/信息”:7,
“请求深度最大值”:2,“响应接收数”:3,
“调度程序/出列”:5,“调度程序/出列/内存”:5,
“调度程序/排队”:5,“调度程序/排队/内存”:5,
“开始时间”:datetime.datetime(2017,4,17,18,48,17,386516)}
2017-04-17 21:48:19[刮屑芯发动机]信息:十字轴关闭
(已完成)
默认情况下,Scrapy会尝试在第一个可单击的输入字段(登录页面的搜索表单)中填充您的电子邮件和密码。您需要通过
formname
或formid
指定输入字段,例如。
FormRequest.from\u response(response,formid=“login form”,formdata={“email”:“XXXXX”,“password”:“XXXXX”},callback=self.start\u crawl)
。