Python 刮屑属性错误：'；SoundseasySpider'；对象没有属性'；爬虫'；_Python_Selenium_Scrapy

Python 刮屑属性错误：'；SoundseasySpider'；对象没有属性'；爬虫'；

python selenium scrapy

Python 刮屑属性错误：'；SoundseasySpider'；对象没有属性'；爬虫'；,python,selenium,scrapy,Python,Selenium,Scrapy,我试图从网页上抓取一些日期，但有时会出现错误： AttributeError: 'SoundseasySpider' object has no attribute 'crawler' 下面是我的代码，它使用selenium web驱动程序（self.browser实例）从动态页面获取数据： import scrapy from ProductsScraper.items import ProductDataItem, ProductDataLoader from utilities.comm

我试图从网页上抓取一些日期，但有时会出现错误：

AttributeError: 'SoundseasySpider' object has no attribute 'crawler'

下面是我的代码，它使用selenium web驱动程序（self.browser实例）从动态页面获取数据：

import scrapy
from ProductsScraper.items import ProductDataItem, ProductDataLoader
from utilities.common import MODE_SINGLE
from utilities.DynamicPageLoader import DynamicPageLoader

def start_requests(self):
    # scrape multi page data
    for page_count, url in zip(self.pages_counts, self.start_urls):
        yield scrapy.Request(url=url, callback=self.multi_parse,
                             meta={'page_count': page_count}, 
                             dont_filter=True)

def multi_parse(self, response):
    """
    Method fetched the pages, gets the product url links and scrape it
    by calling parse_product
    """
    selector = self.get_dynamic_page(url=response.url,
                                     page_count=response.meta.get('page_count', '1'))
    product_urls = selector.xpath('//div[@class="isp_product_info"]/a/@href').extract()
    self.logger.info('{} items should be scraped from the page: {},'
                     ' scroll_count:{}'.format(len(product_urls),
                                               response.url, response.meta.get('page_count', '1')))
    for product_url in product_urls:
        # construct absolute url
        url = "https://www.{}{}".format(self.allowed_domains[0], product_url)
        yield scrapy.Request(url=url, callback=self.parse_product, dont_filter=True)

def get_dynamic_page(self, url, page_count):
    """
    Fetch dynamic page using DynamicDownloader and return selector object
    """
    # construct search page url with the page count included
    pages_url = url + '&page_num={}'.format(page_count)
    self.logger.info("get_dynamic_page: {}".format(pages_url))
    self.browser.load_page(pages_url)
    return scrapy.Selector(text=self.browser.get_html_page())

我做错了什么？感谢您的帮助

编辑：我得到以下例外情况：

  File "/home/user/python3.6.1/lib/python3.6/site-packages/Twisted-17.9.0-py3.6-linux-x86_64.egg/twisted/internet/defer.py", line 1384, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/home/user/python3.6.1/lib/python3.6/site-packages/Twisted-17.9.0-py3.6-linux-x86_64.egg/twisted/python/failure.py", line 408, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/home/user/python3.6.1/lib/python3.6/site-packages/scrapy/core/downloader/middleware.py", line 43, in process_request
    defer.returnValue((yield download_func(request=request,spider=spider)))
twisted.web._newclient.ResponseNeverReceived: [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean f
ashion: Connection lost.>]
During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/swampblu/python3.6.1/lib/python3.6/site-packages/Twisted-17.9.0-py3.6-linux-x86_64.egg/twisted/internet/defer.py", line 1386, in _inlineCallbacks
    result = g.send(result)
  File "/home/swampblu/python3.6.1/lib/python3.6/site-packages/scrapy/core/downloader/middleware.py", line 66, in process_exception
    spider=spider)
  File "/home/swampblu/python3.6.1/lib/python3.6/site-packages/scrapy/downloadermiddlewares/retry.py", line 61, in process_exception
    return self._retry(request, exception, spider)
  File "/home/swampblu/python3.6.1/lib/python3.6/site-packages/scrapy/downloadermiddlewares/retry.py", line 71, in _retry
    stats = spider.crawler.stats
AttributeError: 'SoundsEasySpider' object has no attribute 'crawler'

文件“/home/user/python3.6.1/lib/python3.6/site packages/Twisted-17.9.0-py3.6-linux-x86_64.egg/Twisted/internet/defer.py”，第1384行，在内联回调中
结果=结果。通过ExceptionToGenerator（g）
文件“/home/user/python3.6.1/lib/python3.6/site packages/Twisted-17.9.0-py3.6-linux-x86_64.egg/Twisted/python/failure.py”，第408行，位于ThroweExceptionToGenerator中
返回g.throw（self.type、self.value、self.tb）
文件“/home/user/python3.6.1/lib/python3.6/site packages/scrapy/core/downloader/middleware.py”，第43行，进程中请求
defer.returnValue（（yield download_func（request=request，spider=spider）））
twisted.web.\u newclient.ResponseNeverReceived:[]
在处理上述异常期间，发生了另一个异常：
回溯（最近一次呼叫最后一次）：
文件“/home/smargblu/python3.6.1/lib/python3.6/site packages/Twisted-17.9.0-py3.6-linux-x86_64.egg/Twisted/internet/defer.py”，第1386行，在内联回调中
结果=g.send（结果）
文件“/home/smargblu/python3.6.1/lib/python3.6/site packages/scrapy/core/downloader/middleware.py”，第66行，进程中异常
卡盘（卡盘）
文件“/home/smargblu/python3.6.1/lib/python3.6/site packages/scrapy/downloadermiddleware/retry.py”，第61行，进程中
返回self.\u重试（请求、异常、spider）
文件“/home/smargblu/python3.6.1/lib/python3.6/site packages/scrapy/downloadermiddleware/retry.py”，第71行，在“重试”中
stats=spider.crawler.stats
AttributeError:“SoundsEasySpider”对象没有属性“crawler”

问题在于防刮保护。服务器拒绝了请求。我已启用

您是否可以包括导入部分？我怀疑错误的原因可能在那里。完成，但我猜如果导入部分有问题，那么它永远不会工作，但有时我得到了一些好的结果错误消息应该显示哪一行代码有问题-这就是为什么您应该始终对完整错误（回溯）提出疑问（作为文本，而不是屏幕截图）很好，福拉斯。我想我对进口报关单的看法是错误的。这可能是因为您的selenium浏览器加载页面的速度不够快，因此无法通过空页面或部分页面。您可以尝试添加延迟或更具体的等待指令，以确保在解析响应之前加载页面。制作一个class dude，

class SoundseasySpider（scrapy.Spider）

，并遵循适当的scrapy约定