Cookies 我应该怎么做才能启用Cookie并为此url使用scrapy？_Cookies_Scrapy_Scrapy Spider_Scrapy Shell

Cookies 我应该怎么做才能启用Cookie并为此url使用scrapy？

cookies scrapy

Cookies 我应该怎么做才能启用Cookie并为此url使用scrapy？,cookies,scrapy,scrapy-spider,scrapy-shell,Cookies,Scrapy,Scrapy Spider,Scrapy Shell,我正在使用scrapy进行具有此url的scrapying项目我尝试使用url并在shell中打开它，但它出现了430个错误，因此我在标题中添加了一些设置，如下所示： scrapy shell-s COOKIES_ENABLED=1-s USER_AGENT='Mozilla/5.0（X11；Ubuntu；Linux x86_64；rv:46.0）Gecko/20100101 Firefox/46.0'' 它得到了页面“200”，但一旦我使用view（response），它就会将我指向一个页面

我正在使用scrapy进行具有此url的scrapying项目

我尝试使用url并在shell中打开它，但它出现了430个错误，因此我在标题中添加了一些设置，如下所示：

scrapy shell-s COOKIES_ENABLED=1-s USER_AGENT='Mozilla/5.0（X11；Ubuntu；Linux x86_64；rv:46.0）Gecko/20100101 Firefox/46.0''

它得到了页面“200”，但一旦我使用view（response），它就会将我指向一个页面，上面写着：很抱歉您的web浏览器不接受Cookie

以下是日志的屏幕截图：

你应该

COOKIES_ENABLED = True

在

settings.py

文件中

也看到

COOKIES_DEBUG = True

要调试Cookie，您将看到每个响应/请求分别有哪些Cookie传入/传出。

尝试发送所有必需的标头

headers = {
    'dnt': '1',
    'accept-encoding': 'gzip, deflate, sdch, br',
    'accept-language': 'en-US,en;q=0.8',
    'upgrade-insecure-requests': '1',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36',
    'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
    'cache-control': 'max-age=0',
    'authority': 'www.walmart.ca',
    'cookie': 'JSESSIONID=E227789DA426B03664F0F5C80412C0BB.restapp-108799501-8-112264256; cookieLanguageType=en; deliveryCatchment=2000; marketCatchment=2001; zone=2; originalHttpReferer=; walmart.shippingPostalCode=V5M2G7; defaultNearestStoreId=1015; walmart.csrf=6f635f71ab4ae4479b8e959feb4f3e81d0ac9d91-1497631184063-441217ff1a8e4a311c2f9872; wmt.c=0; userSegment=50-percent; akaau_P1=1497632984~id=bb3add0313e0873cf64b5e0a73e3f5e3; wmt.breakpoint=d; TBV=7; ENV=ak-dal-prod; AMCV_C4C6370453309C960A490D44%40AdobeOrg=793872103%7CMCIDTS%7C17334',
    'referer': 'https://www.walmart.ca/en/clothing-shoes-accessories/men/mens-tops/N-2566+11',
}

yield Request(url = 'https://www.walmart.ca/en/clothing-shoes-accessories/men/mens-tops/N-2566+11', headers=headers)

您可以这样实现，而不是使用

start\u URL

我建议使用

start\u requests（）

方法。它很容易阅读

class EasySpider(CrawlSpider): 
    name = 'easy' 

    def start_requests(self):
        headers = {
        'dnt': '1',
        'accept-encoding': 'gzip, deflate, sdch, br',
        'accept-language': 'en-US,en;q=0.8',
        'upgrade-insecure-requests': '1',
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36',
        'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
        'cache-control': 'max-age=0',
        'authority': 'www.walmart.ca',
        'cookie': 'JSESSIONID=E227789DA426B03664F0F5C80412C0BB.restapp-108799501-8-112264256; cookieLanguageType=en; deliveryCatchment=2000; marketCatchment=2001; zone=2; originalHttpReferer=; walmart.shippingPostalCode=V5M2G7; defaultNearestStoreId=1015; walmart.csrf=6f635f71ab4ae4479b8e959feb4f3e81d0ac9d91-1497631184063-441217ff1a8e4a311c2f9872; wmt.c=0; userSegment=50-percent; akaau_P1=1497632984~id=bb3add0313e0873cf64b5e0a73e3f5e3; wmt.breakpoint=d; TBV=7; ENV=ak-dal-prod; AMCV_C4C6370453309C960A490D44%40AdobeOrg=793872103%7CMCIDTS%7C17334',
        'referer': 'https://www.walmart.ca/en/clothing-shoes-accessories/men/mens-tops/N-2566+11',
        }       

        yield Request(url = 'https://www.walmart.ca/en/clothing-shoes-accessories/men/m‌ens-tops/N-2566+11', callback = self.parse_item, headers = headers)

        def parse_item(self, response): 
            i = CravlingItem() 
            i['title'] = " ".join( response.xpath('//a/text()').extract()).strip() 
            yield i

我可以确认

COOKIES\u ENABLED

设置对修复错误没有帮助。相反，使用以下googlebot

USER\u AGENT

使其工作：

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Googlebot/2.1; http://www.google.com/bot.html) Chrome/W.X.Y.Z‡ Safari/537.36

多亏了编写此脚本的人，该脚本使用该用户代理发出请求：

它没有解决问题，下面是日志：

Set Cookie:akaau_P1=1497629246~id=6be87a77f2656d101e24517432b9abc；path=/2017-06-16 17:37:26[scrapy.core.engine]调试：Crawled（403）（referer:None）2017-06-16 17:37:26[scrapy.spidermiddleware.httperror]信息：忽略响应：HTTP状态码未处理或不允许

请解释一下，如何在代码上实现这是我的代码：

类EasySpider（crawlespider）：name='easy'开始\u URL=['https://www.walmart.ca/en/clothing-shoes-accessories/men/mens-tops/N-2566+11']def parse_item（self，response）：i=crawlingitem（）i['title']=''.join（response.xpath（'//a/text（）'）.extract（））.strip（）返回i