Scrapy Spider工作正常,但没有';不要勉强获得一些结果

Scrapy Spider工作正常,但没有';不要勉强获得一些结果,scrapy,web-crawler,Scrapy,Web Crawler,它工作正常,大约有208个产品信息,但是对于一些产品细节,它没有给出结果,我已经在scrapy shell中单独执行了这些产品链接,工作正常,但是为什么它遗漏了25%的产品细节 我尝试了旋转用户代理,应用了不同的XPath,但没有成功 import scrapy from scrapy.spiders import CrawlSpider, Rule from ..items import AmazonItem import time from scrapy.linkextractors imp

它工作正常,大约有208个产品信息,但是对于一些产品细节,它没有给出结果,我已经在scrapy shell中单独执行了这些产品链接,工作正常,但是为什么它遗漏了25%的产品细节

我尝试了旋转用户代理,应用了不同的XPath,但没有成功

import scrapy
from scrapy.spiders import CrawlSpider, Rule
from ..items import AmazonItem
import time
from scrapy.linkextractors import LinkExtractor
import urllib.parse


class QuotesSpider(scrapy.Spider):
    name = 'pet'
    start_urls = ['https://www.amazon.co.uk/s?k=moleskine&rh=p_89%3AMoleskine&dc&qid=1567115653&rnid=1632651031&ref=sr_nr_p_89_1',
                  'https://www.amazon.co.uk/s?k=moleskine&rh=p_89%3AMoleskine&dc&page=2',
                  'https://www.amazon.co.uk/s?k=moleskine&rh=p_89%3AMoleskine&dc&page=3',
                  'https://www.amazon.co.uk/s?k=moleskine&rh=p_89%3AMoleskine&dc&page=4',
                  'https://www.amazon.co.uk/s?k=moleskine&rh=p_89%3AMoleskine&dc&page=5'
                  ]

def parse(self, response):
    links =response.xpath("//h2/a[contains(@href,'/dp')]/@href").extract()
    urll = ['https://www.amazon.co.uk' + link for link in links]
    urls = urll
    for url in urls:
        yield scrapy.Request(url=url, callback=self.parse_details)
def parse_details(self, response):
    global name1
    global sales_rank11
    global price1
    global prime1
    list = AmazonItem()
    name = response.xpath(".//*[(@id ='productTitle')]/text()").extract_first()
    if name is None:
        name1 = name
        self.logger.info('skip')
    else:
        name1 = name.replace('\n', '').strip()

    price = response.xpath("//span[@id='price_inside_buybox']/text()").get()
    if price is None:
        price1 = response.xpath("//span[@class='a-color-price']/text()").get()
        if price1 is None:
            price1 = 'No Price Avaiable'
        self.logger.info('skip')
    else:
        price1 = price.replace('\n', '').replace(' ','')

    prime = response.xpath("//span[@id='price-shipping-message']/b").get()
    if prime is None:
        prime1 = 'Not Prime'
    else:
        prime1 = 'Prime'
    sales_rank1 = response.xpath("//tr[@id='SalesRank']/td[@class='value']/text()").get()
    if sales_rank1 is None:
        sales_rank11 = 'No Sales Rank Available'
    else:
        sales_rank11 = sales_rank1.replace('(','').replace('\n','')
    list['Name'] = name1
    list['Price'] = price1
    list['SalesRank'] = sales_rank11
    list['Prime'] = prime1
    list['Url'] = response.url
    yield list


我遗漏了什么吗?

你能检查一下你发布的代码缩进并发布你的
amazonim
代码吗?最好删除示例中任何不必要的部分以检查您得到的响应,因为您被检测为机器人,我的Amazon可能会返回不完整的响应。感谢您的回复,我发现了,它给了我一些页面的captcha响应,有什么方法可以防止吗?