Python Scrapy既不显示任何错误，也不获取任何数据_Python_Web Scraping_Scrapy

Python Scrapy既不显示任何错误，也不获取任何数据

python web-scraping scrapy

Python Scrapy既不显示任何错误，也不获取任何数据,python,web-scraping,scrapy,Python,Web Scraping,Scrapy,尝试使用scrapy解析网站上的产品名称和价格。然而，当我运行我的scrapy代码时，它既不显示任何错误，也不获取任何数据。我做错的事超出了我的能力去发现。希望有人来调查一下 items.py包括： import scrapy class SephoraItem(scrapy.Item): Name = scrapy.Field() Price = scrapy.Field() 名为sephorasp.py的spider文件包含： from scrapy.contrib.spi

尝试使用scrapy解析网站上的产品名称和价格。然而，当我运行我的scrapy代码时，它既不显示任何错误，也不获取任何数据。我做错的事超出了我的能力去发现。希望有人来调查一下

items.py包括：

import scrapy
class SephoraItem(scrapy.Item):
    Name = scrapy.Field()
    Price = scrapy.Field()

名为sephorasp.py的spider文件包含：

from scrapy.contrib.spiders import CrawlSpider, Rule 
from scrapy.linkextractors import LinkExtractor

class SephoraspSpider(CrawlSpider):
    name = "sephorasp"
    allowed_domains = ['sephora.ae']
    start_urls = ["https://www.sephora.ae/en/stores/"]
    rules = [
            Rule(LinkExtractor(restrict_xpaths='//li[@class="level0 nav-1 active first touch-dd  parent"]')),
            Rule(LinkExtractor(restrict_xpaths='//li[@class="level2 nav-1-1-1 active first"]'),
            callback="parse_item")
    ]

    def parse_item(self, response):
        page = response.xpath('//div[@class="product-info"]')
        for titles in page:
            Product = titles.xpath('.//a[@title]/text()').extract()
            Rate = titles.xpath('.//span[@class="price"]/text()').extract()
            yield {'Name':Product,'Price':Rate}

以下是指向日志的链接：

当我与BaseSpider一起玩时，它会起作用：

from scrapy.spider import BaseSpider
from scrapy.http.request import Request

class SephoraspSpider(BaseSpider):
    name = "sephorasp"
    allowed_domains = ['sephora.ae']
    start_urls = [
                    "https://www.sephora.ae/en/travel-size/make-up",
                    "https://www.sephora.ae/en/perfume/women-perfume",
                    "https://www.sephora.ae/en/makeup/eye/eyeshadow",
                    "https://www.sephora.ae/en/skincare/moisturizers",
                    "https://www.sephora.ae/en/gifts/palettes"

    ]

    def pro(self, response):
        item_links = response.xpath('//a[contains(@class,"level0")]/@href').extract()
        for a in item_links:
            yield Request(a, callback = self.end)

    def end(self, response):
        item_link = response.xpath('//a[@class="level2"]/@href').extract()
        for b in item_link:
            yield Request(b, callback = self.parse)

    def parse(self, response):
        page = response.xpath('//div[@class="product-info"]')
        for titles in page:
            Product= titles.xpath('.//a[@title]/text()').extract()
            Rate= titles.xpath('.//span[@class="price"]/text()').extract()
            yield {'Name':Product,'Price':Rate}

你的XPath有严重缺陷

Rule(LinkExtractor(restrict_xpaths='//li[@class="level0 nav-1 active first touch-dd  parent"]')),
Rule(LinkExtractor(restrict_xpaths='//li[@class="level2 nav-1-1-1 active first"]'),

您正在匹配整个类范围，这些范围可以在任何时候更改，并且在scrapy中的顺序可能不同。只需选择一个类，它很可能足够独特：

Rule(LinkExtractor(restrict_xpaths='//li[contains(@class,"level0")]')),
Rule(LinkExtractor(restrict_xpaths='//li[contains(@class,"level2")]')),

你能把爬网日志贴出来吗？您可以通过scrapy crawl spider-s LOG_FILE=output.LOG或scrapy crawl spider&>output.LOG命令执行此操作。感谢Granitosaurus爵士的回复。我已经添加了你想要的。但是我无法以可搜索的格式上传。亲爱的Granitosaurus先生，我使用您更正的XPath运行了我的代码，但在我上传的控制台中出现了新的错误。您可以看到上面的图像是更新后的图像。谢谢。亲爱的龙先生，我按照你的建议抓到了日志，但无法上传。它很大。你能告诉我怎么做吗？@SMth80使用某种粘贴箱，例如上传日志和共享链接。好的，先生，我会这样做。顺便说一句，如果我使用Basespider编写代码，那么它或多或少可以工作。已经更新了上面的工作版本。不过，我想在不输入任何url的情况下进行解析。