如何从scrapy中的javascript事件中提取项目?
这是全部代码。我无法检索产品的颜色,因为它们是动态的。以此URl为例: 它有多种颜色。只需要颜色的名称如何从scrapy中的javascript事件中提取项目?,javascript,python,html,scrapy,Javascript,Python,Html,Scrapy,这是全部代码。我无法检索产品的颜色,因为它们是动态的。以此URl为例: 它有多种颜色。只需要颜色的名称 import scrapy from scrapy.linkextractors import LinkExtractor class WoolRich(scrapy.Spider): name= "WoolRich_Spider" allowed_domains = ['woolrich.com'] start_urls = ['https://www.woolr
import scrapy
from scrapy.linkextractors import LinkExtractor
class WoolRich(scrapy.Spider):
name= "WoolRich_Spider"
allowed_domains = ['woolrich.com']
start_urls = ['https://www.woolrich.com/men/?sort=featured&page=1']
def parse(self, response):
links = response.css('li.product> article> figure> a::attr(href)').extract()
for link in links:
yield scrapy.Request(link,
callback=self.parse_of_individual_page)
next_page=LinkExtractor(allow=[''], deny=['sort', 'size', 'Size', 'fsnf'])
links = next_page.extract_links(response)
for link in links:
yield scrapy.Request(link.url,
callback=self.parse)
# response.css('div.productView-image').extract()
def parse_of_individual_page(self, response):
self.arbi = {
'Product Name': response.css('h1.productView-title::text').extract(),
'Style': response.css('.productView-product > div:nth-child(2) > strong:nth-child(1)::text').extract(),
'Price': response.css('span.price::text')[0].extract(),
'Size': response.css('span.form-option-variant::text').extract(),
'Features': response.css('#features-content > li::text').extract(),
'Description': response.css('#details-content::text').extract(),
'Path from home': response.css('a.breadcrumb-label::text').extract(),
'Image links': response.css('div.zoom> a::attr(data-zoom-image)').extract()
}
yield self.arbi
这就解决了问题。我忽略了HTML中的这一行我不确定是否理解您的问题,但您需要删除仅在某些Javascript代码运行后显示的页面数据。当你的Scrapy请求完成时,HTML没有正确加载,对吗?我不知道这是否是最好的解决方案,但你可以使用一个类似浏览器的WebDriver,然后你可以在页面完全加载后刮取页面。
response.css('label.form-option-swatch> span::attr(title)').extract()