Python Scrapy既不显示任何错误,也不获取任何数据
尝试使用scrapy解析网站上的产品名称和价格。然而,当我运行我的scrapy代码时,它既不显示任何错误,也不获取任何数据。我做错的事超出了我的能力去发现。希望有人来调查一下 items.py包括:Python Scrapy既不显示任何错误,也不获取任何数据,python,web-scraping,scrapy,Python,Web Scraping,Scrapy,尝试使用scrapy解析网站上的产品名称和价格。然而,当我运行我的scrapy代码时,它既不显示任何错误,也不获取任何数据。我做错的事超出了我的能力去发现。希望有人来调查一下 items.py包括: import scrapy class SephoraItem(scrapy.Item): Name = scrapy.Field() Price = scrapy.Field() 名为sephorasp.py的spider文件包含: from scrapy.contrib.spi
import scrapy
class SephoraItem(scrapy.Item):
Name = scrapy.Field()
Price = scrapy.Field()
名为sephorasp.py的spider文件包含:
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
class SephoraspSpider(CrawlSpider):
name = "sephorasp"
allowed_domains = ['sephora.ae']
start_urls = ["https://www.sephora.ae/en/stores/"]
rules = [
Rule(LinkExtractor(restrict_xpaths='//li[@class="level0 nav-1 active first touch-dd parent"]')),
Rule(LinkExtractor(restrict_xpaths='//li[@class="level2 nav-1-1-1 active first"]'),
callback="parse_item")
]
def parse_item(self, response):
page = response.xpath('//div[@class="product-info"]')
for titles in page:
Product = titles.xpath('.//a[@title]/text()').extract()
Rate = titles.xpath('.//span[@class="price"]/text()').extract()
yield {'Name':Product,'Price':Rate}
以下是指向日志的链接:
当我与BaseSpider一起玩时,它会起作用:
from scrapy.spider import BaseSpider
from scrapy.http.request import Request
class SephoraspSpider(BaseSpider):
name = "sephorasp"
allowed_domains = ['sephora.ae']
start_urls = [
"https://www.sephora.ae/en/travel-size/make-up",
"https://www.sephora.ae/en/perfume/women-perfume",
"https://www.sephora.ae/en/makeup/eye/eyeshadow",
"https://www.sephora.ae/en/skincare/moisturizers",
"https://www.sephora.ae/en/gifts/palettes"
]
def pro(self, response):
item_links = response.xpath('//a[contains(@class,"level0")]/@href').extract()
for a in item_links:
yield Request(a, callback = self.end)
def end(self, response):
item_link = response.xpath('//a[@class="level2"]/@href').extract()
for b in item_link:
yield Request(b, callback = self.parse)
def parse(self, response):
page = response.xpath('//div[@class="product-info"]')
for titles in page:
Product= titles.xpath('.//a[@title]/text()').extract()
Rate= titles.xpath('.//span[@class="price"]/text()').extract()
yield {'Name':Product,'Price':Rate}
你的XPath有严重缺陷
Rule(LinkExtractor(restrict_xpaths='//li[@class="level0 nav-1 active first touch-dd parent"]')),
Rule(LinkExtractor(restrict_xpaths='//li[@class="level2 nav-1-1-1 active first"]'),
您正在匹配整个类范围,这些范围可以在任何时候更改,并且在scrapy中的顺序可能不同。只需选择一个类,它很可能足够独特:
Rule(LinkExtractor(restrict_xpaths='//li[contains(@class,"level0")]')),
Rule(LinkExtractor(restrict_xpaths='//li[contains(@class,"level2")]')),
你能把爬网日志贴出来吗?您可以通过scrapy crawl spider-s LOG_FILE=output.LOG或scrapy crawl spider&>output.LOG命令执行此操作。感谢Granitosaurus爵士的回复。我已经添加了你想要的。但是我无法以可搜索的格式上传。亲爱的Granitosaurus先生,我使用您更正的XPath运行了我的代码,但在我上传的控制台中出现了新的错误。您可以看到上面的图像是更新后的图像。谢谢。亲爱的龙先生,我按照你的建议抓到了日志,但无法上传。它很大。你能告诉我怎么做吗?@SMth80使用某种粘贴箱,例如上传日志和共享链接。好的,先生,我会这样做。顺便说一句,如果我使用Basespider编写代码,那么它或多或少可以工作。已经更新了上面的工作版本。不过,我想在不输入任何url的情况下进行解析。