Web scraping Scrapy FormRequest登录不工作

Web scraping Scrapy FormRequest登录不工作,web-scraping,scrapy,Web Scraping,Scrapy,我正在尝试使用Scrapy登录,但收到了很多“重定向(302)”消息。这种情况发生在我使用真实登录时,也使用假登录信息时。我还尝试了另一个网站,但仍然没有运气 import scrapy from scrapy.http import FormRequest, Request class LoginSpider(scrapy.Spider): name = 'SOlogin' allowed_domains = ['stackoverflow.com'] login_

我正在尝试使用Scrapy登录,但收到了很多“重定向(302)”消息。这种情况发生在我使用真实登录时,也使用假登录信息时。我还尝试了另一个网站,但仍然没有运气

import scrapy
from scrapy.http import FormRequest, Request

class LoginSpider(scrapy.Spider):
    name = 'SOlogin'
    allowed_domains = ['stackoverflow.com']

    login_url = 'https://stackoverflow.com/users/login?ssrc=head&returnurl=http%3a%2f%2fstackoverflow.com%2f'
    test_url = 'http://stackoverflow.com/questions/ask'

    def start_requests(self):
        yield Request(url=self.login_url, callback=self.parse_login)

    def parse_login(self, response):
        return FormRequest.from_response(response, formdata={"email": "XXXXX", "password": "XXXXX"}, callback=self.start_crawl)

    def start_crawl(self, response):
       yield Request(self.test_url, callback=self.parse_item)

    def parse_item(self, response):
        print("Test URL " + response.url)
我还尝试添加

meta = {'dont_redirect': True, 'handle_httpstatus_list':[302]} 
到初始请求和FormRequest

下面是上面代码的输出:

2017-04-17 21:48:17[scrapy.utils.log]信息:scrapy 1.3.3已启动 (bot:stackoverflow)2017-04-17 21:48:17[刮伤的痕迹]信息: 覆盖的设置:{'BOT_NAME':'stackoverflow','NEWSPIDER_MODULE': “stackoverflow.SPIDER”,“SPIDER_模块”:[“stackoverflow.SPIDER'], “用户代理”:“Mozilla/5.0”}2017-04-17 21:48:17[剪贴簿中间件] 信息:启用的扩展:['scrapy.extensions.corestats.corestats', 'scrapy.extensions.telnet.TelnetConsole', "scrapy.extensions.logstats.logstats"2017-04-17 21:48:17 [scrapy.middleware]信息:启用的下载程序中间件: ['scrapy.downloaderMiddleware.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddleware.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloaderMiddleware.defaultheaders.DefaultHeadersMiddleware', 'scrapy.DownloaderMiddleware.useragent.UserAgentMiddleware', 'scrapy.DownloaderMiddleware.retry.RetryMiddleware', 'scrapy.DownloaderMiddleware.redirect.MetaRefreshMiddleware', 'scrapy.DownloaderMiddleware.httpcompression.HttpCompressionMiddleware', 'scrapy.DownloaderMiddleware.redirect.RedirectMiddleware', “scrapy.DownloaderMiddleware.cookies.CookiesMiddleware”, “scrapy.downloaderMiddleware.stats.DownloaderStats”]2017-04-17 21:48:17[scrapy.middleware]信息:启用的蜘蛛中间件: ['scrapy.spidermiddleware.httperror.httperror中间件', '刮皮.SpiderMiddleware.场外.场外Iddleware', “scrapy.Spidermiddleware.referer.RefererMiddleware”, 'scrapy.spiderMiddleware.urllength.UrlLengthMiddleware', "刮皮.Spidermiddleware.深度.深度middleware"2017-04-17 21:48:17 [scrapy.middleware]信息:启用的项目管道:[]2017-04-17 21:48:17[刮屑.核心.引擎]信息:蜘蛛网打开2017-04-17 21:48:17 [scrapy.extensions.logstats]信息:已爬网0页(0页/分钟), 擦伤0件(以0件/分钟的速度)2017-04-17 21:48:17 [scrapy.extensions.telnet]调试:telnet控制台正在侦听 127.0.0.1:6023 2017-04-17 21:48:18[scrapy.core.engine]调试:爬网(200)https://stackoverflow.com/users/login?ssrc=head&returnurl=http%3a%2f%2fstackoverflow.com%2f> (参考:无)2017-04-17 21:48:18[刮屑核心引擎]调试: 爬网(200)https://stackoverflow.com/search?q=&email=XXXXX&password=XXXXX> (参考: ) 2017-04-17 21:48:19[scrapy.DownloaderMiddleware.redirect]调试: 重定向(302)到http://stackoverflow.com/users/login?ssrc=anon_ask&returnurl=http%3a%2f%2fstackoverflow.com%2fquestions%2fask> 从…起http://stackoverflow.com/questions/ask> 2017-04-17 21:48:19 [scrapy.DownloaderMiddleware.redirect]调试:将(302)重定向到 https://stackoverflow.com/users/login?ssrc=anon_ask&returnurl=http%3a%2f%2fstackoverflow.com%2fquestions%2fask> 从…起http://stackoverflow.com/users/login?ssrc=anon_ask&returnurl=http%3a%2f%2fstackoverflow.com%2fquestions%2fask> 2017-04-17 21:48:19[刮屑核心引擎]调试:爬网(200)https://stackoverflow.com/users/login?ssrc=anon_ask&returnurl=http%3a%2f%2fstackoverflow.com%2fquestions%2fask> (参考: )试验 统一资源定位地址 2017-04-17 21:48:19[刮屑芯发动机]信息:关闭卡盘 (已完成)2017-04-17 21:48:19[垃圾收集站]信息:倾倒 碎片统计:{'downloader/request_bytes':1772, “下载程序/请求计数”:5,“下载程序/请求计数/获取方法”: 5,“下载程序/响应_字节”:34543,“下载程序/响应_计数”: 5,“下载器/响应状态/计数/200”:3, “下载程序/响应状态\计数/302”:2,“完成原因”: “完成”、“完成时间”:datetime.datetime(2017、4、17、18、48、19、, 470354),“日志计数/调试”:6,“日志计数/信息”:7, “请求深度最大值”:2,“响应接收数”:3, “调度程序/出列”:5,“调度程序/出列/内存”:5, “调度程序/排队”:5,“调度程序/排队/内存”:5, “开始时间”:datetime.datetime(2017,4,17,18,48,17,386516)} 2017-04-17 21:48:19[刮屑芯发动机]信息:十字轴关闭 (已完成)


默认情况下,Scrapy会尝试在第一个可单击的输入字段(登录页面的搜索表单)中填充您的电子邮件和密码。您需要通过
formname
formid
指定输入字段,例如。
FormRequest.from\u response(response,formid=“login form”,formdata={“email”:“XXXXX”,“password”:“XXXXX”},callback=self.start\u crawl)