Scrapy 如何从刮擦飞溅的反应中得到饼干_Scrapy_Scrapy Splash

Scrapy 如何从刮擦飞溅的反应中得到饼干

scrapy

Scrapy 如何从刮擦飞溅的反应中得到饼干,scrapy,scrapy-splash,Scrapy,Scrapy Splash,我想从splash的response对象中获取cookie值。但它并没有像我预期的那样起作用这是蜘蛛代码 class AmazonSpider(scrapy.Spider): name = 'amazon' allowed_domains = ['amazon.com'] def start_requests(self): url = 'https://www.amazon.com/gp/goldbox?ref_=nav_topnav_deals'

我想从splash的response对象中获取cookie值。但它并没有像我预期的那样起作用

这是蜘蛛代码

class AmazonSpider(scrapy.Spider):
    name = 'amazon'
    allowed_domains = ['amazon.com']

    def start_requests(self):
        url = 'https://www.amazon.com/gp/goldbox?ref_=nav_topnav_deals'
        yield SplashRequest(url, self.parse, args={'wait': 0.5})

    def parse(self, response):
        print(response.headers)

输出日志：

2019-08-17 11:53:07 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.amazon.com/robots.txt> (referer: None)
2019-08-17 11:53:08 [scrapy.core.engine] DEBUG: Crawled (404) <GET http://192.168.99.100:8050/robots.txt> (referer: None)
2019-08-17 11:53:24 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.amazon.com/gp/goldbox?ref_=nav_topnav_deals via http://192.168.99.100:8050/render.html> (referer: None)
{b'Date': [b'Sat, 17 Aug 2019 06:23:09 GMT'], b'Server': [b'TwistedWeb/18.9.0'], b'Content-Type': [b'text/html; charset=utf-8']}
2019-08-17 11:53:24 [scrapy.core.engine] INFO: Closing spider (finished)

2019-08-17 11:53:07[scrapy.core.engine]调试：爬网（200）（参考：无）
2019-08-17 11:53:08[scrapy.core.engine]调试：爬网（404）（参考：无）
2019-08-17 11:53:24[scrapy.core.engine]调试：爬网（200）（参考：无）
{b'Date'：[b'Sat，2019年8月17日06:23:09 GMT']，b'Server'：[b'TwistedWeb/18.9.0']，b'Content-Type'：[b'text/html；charset=utf-8']}
2019-08-17 11:53:24[刮屑芯发动机]信息：关闭卡盘（已完成）

您可以尝试以下方法： -编写一个小Lua脚本，返回html+cookies：

lua_request = """
    function main(splash)
        splash:init_cookies(splash.args.cookies)
        assert(splash:go(splash.args.url))
        splash:wait(0.5)
        return {
            html = splash:html(),
            cookies = splash:get_cookies()
        }
    end
    """

将您的请求更改为以下内容：

yield SplashRequest(
    url,
    self.parse,
    endpoint='execute',
    args={'lua_source': self.lua_request}
)

然后在解析方法中查找cookie，如下所示：

def parse(self, response):
    cookies = response.data['cookies']
    headers = response.headers

它还会捕获同一页面的子序列ajax请求的cookies吗？您能否更新您的答案以从响应中获取标题数据？