Python 刮屑属性错误:';SoundseasySpider';对象没有属性';爬虫';

Python 刮屑属性错误:';SoundseasySpider';对象没有属性';爬虫';,python,selenium,scrapy,Python,Selenium,Scrapy,我试图从网页上抓取一些日期,但有时会出现错误: AttributeError: 'SoundseasySpider' object has no attribute 'crawler' 下面是我的代码,它使用selenium web驱动程序(self.browser实例)从动态页面获取数据: import scrapy from ProductsScraper.items import ProductDataItem, ProductDataLoader from utilities.comm

我试图从网页上抓取一些日期,但有时会出现错误:

AttributeError: 'SoundseasySpider' object has no attribute 'crawler'
下面是我的代码,它使用selenium web驱动程序(self.browser实例)从动态页面获取数据:

import scrapy
from ProductsScraper.items import ProductDataItem, ProductDataLoader
from utilities.common import MODE_SINGLE
from utilities.DynamicPageLoader import DynamicPageLoader

def start_requests(self):
    # scrape multi page data
    for page_count, url in zip(self.pages_counts, self.start_urls):
        yield scrapy.Request(url=url, callback=self.multi_parse,
                             meta={'page_count': page_count}, 
                             dont_filter=True)

def multi_parse(self, response):
    """
    Method fetched the pages, gets the product url links and scrape it
    by calling parse_product
    """
    selector = self.get_dynamic_page(url=response.url,
                                     page_count=response.meta.get('page_count', '1'))
    product_urls = selector.xpath('//div[@class="isp_product_info"]/a/@href').extract()
    self.logger.info('{} items should be scraped from the page: {},'
                     ' scroll_count:{}'.format(len(product_urls),
                                               response.url, response.meta.get('page_count', '1')))
    for product_url in product_urls:
        # construct absolute url
        url = "https://www.{}{}".format(self.allowed_domains[0], product_url)
        yield scrapy.Request(url=url, callback=self.parse_product, dont_filter=True)

def get_dynamic_page(self, url, page_count):
    """
    Fetch dynamic page using DynamicDownloader and return selector object
    """
    # construct search page url with the page count included
    pages_url = url + '&page_num={}'.format(page_count)
    self.logger.info("get_dynamic_page: {}".format(pages_url))
    self.browser.load_page(pages_url)
    return scrapy.Selector(text=self.browser.get_html_page())
我做错了什么?感谢您的帮助

编辑: 我得到以下例外情况:

  File "/home/user/python3.6.1/lib/python3.6/site-packages/Twisted-17.9.0-py3.6-linux-x86_64.egg/twisted/internet/defer.py", line 1384, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/home/user/python3.6.1/lib/python3.6/site-packages/Twisted-17.9.0-py3.6-linux-x86_64.egg/twisted/python/failure.py", line 408, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/home/user/python3.6.1/lib/python3.6/site-packages/scrapy/core/downloader/middleware.py", line 43, in process_request
    defer.returnValue((yield download_func(request=request,spider=spider)))
twisted.web._newclient.ResponseNeverReceived: [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean f
ashion: Connection lost.>]
During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/swampblu/python3.6.1/lib/python3.6/site-packages/Twisted-17.9.0-py3.6-linux-x86_64.egg/twisted/internet/defer.py", line 1386, in _inlineCallbacks
    result = g.send(result)
  File "/home/swampblu/python3.6.1/lib/python3.6/site-packages/scrapy/core/downloader/middleware.py", line 66, in process_exception
    spider=spider)
  File "/home/swampblu/python3.6.1/lib/python3.6/site-packages/scrapy/downloadermiddlewares/retry.py", line 61, in process_exception
    return self._retry(request, exception, spider)
  File "/home/swampblu/python3.6.1/lib/python3.6/site-packages/scrapy/downloadermiddlewares/retry.py", line 71, in _retry
    stats = spider.crawler.stats
AttributeError: 'SoundsEasySpider' object has no attribute 'crawler'
文件“/home/user/python3.6.1/lib/python3.6/site packages/Twisted-17.9.0-py3.6-linux-x86_64.egg/Twisted/internet/defer.py”,第1384行,在内联回调中
结果=结果。通过ExceptionToGenerator(g)
文件“/home/user/python3.6.1/lib/python3.6/site packages/Twisted-17.9.0-py3.6-linux-x86_64.egg/Twisted/python/failure.py”,第408行,位于ThroweExceptionToGenerator中
返回g.throw(self.type、self.value、self.tb)
文件“/home/user/python3.6.1/lib/python3.6/site packages/scrapy/core/downloader/middleware.py”,第43行,进程中请求
defer.returnValue((yield download_func(request=request,spider=spider)))
twisted.web.\u newclient.ResponseNeverReceived:[]
在处理上述异常期间,发生了另一个异常:
回溯(最近一次呼叫最后一次):
文件“/home/smargblu/python3.6.1/lib/python3.6/site packages/Twisted-17.9.0-py3.6-linux-x86_64.egg/Twisted/internet/defer.py”,第1386行,在内联回调中
结果=g.send(结果)
文件“/home/smargblu/python3.6.1/lib/python3.6/site packages/scrapy/core/downloader/middleware.py”,第66行,进程中异常
卡盘(卡盘)
文件“/home/smargblu/python3.6.1/lib/python3.6/site packages/scrapy/downloadermiddleware/retry.py”,第61行,进程中
返回self.\u重试(请求、异常、spider)
文件“/home/smargblu/python3.6.1/lib/python3.6/site packages/scrapy/downloadermiddleware/retry.py”,第71行,在“重试”中
stats=spider.crawler.stats
AttributeError:“SoundsEasySpider”对象没有属性“crawler”

问题在于防刮保护。服务器拒绝了请求。我已启用

您是否可以包括导入部分?我怀疑错误的原因可能在那里。完成,但我猜如果导入部分有问题,那么它永远不会工作,但有时我得到了一些好的结果错误消息应该显示哪一行代码有问题-这就是为什么您应该始终对完整错误(回溯)提出疑问(作为文本,而不是屏幕截图)很好,福拉斯。我想我对进口报关单的看法是错误的。这可能是因为您的selenium浏览器加载页面的速度不够快,因此无法通过空页面或部分页面。您可以尝试添加延迟或更具体的等待指令,以确保在解析响应之前加载页面。制作一个class dude,
class SoundseasySpider(scrapy.Spider)
,并遵循适当的scrapy约定