Python 刮屑属性错误:';SoundseasySpider';对象没有属性';爬虫';
我试图从网页上抓取一些日期,但有时会出现错误:Python 刮屑属性错误:';SoundseasySpider';对象没有属性';爬虫';,python,selenium,scrapy,Python,Selenium,Scrapy,我试图从网页上抓取一些日期,但有时会出现错误: AttributeError: 'SoundseasySpider' object has no attribute 'crawler' 下面是我的代码,它使用selenium web驱动程序(self.browser实例)从动态页面获取数据: import scrapy from ProductsScraper.items import ProductDataItem, ProductDataLoader from utilities.comm
AttributeError: 'SoundseasySpider' object has no attribute 'crawler'
下面是我的代码,它使用selenium web驱动程序(self.browser实例)从动态页面获取数据:
import scrapy
from ProductsScraper.items import ProductDataItem, ProductDataLoader
from utilities.common import MODE_SINGLE
from utilities.DynamicPageLoader import DynamicPageLoader
def start_requests(self):
# scrape multi page data
for page_count, url in zip(self.pages_counts, self.start_urls):
yield scrapy.Request(url=url, callback=self.multi_parse,
meta={'page_count': page_count},
dont_filter=True)
def multi_parse(self, response):
"""
Method fetched the pages, gets the product url links and scrape it
by calling parse_product
"""
selector = self.get_dynamic_page(url=response.url,
page_count=response.meta.get('page_count', '1'))
product_urls = selector.xpath('//div[@class="isp_product_info"]/a/@href').extract()
self.logger.info('{} items should be scraped from the page: {},'
' scroll_count:{}'.format(len(product_urls),
response.url, response.meta.get('page_count', '1')))
for product_url in product_urls:
# construct absolute url
url = "https://www.{}{}".format(self.allowed_domains[0], product_url)
yield scrapy.Request(url=url, callback=self.parse_product, dont_filter=True)
def get_dynamic_page(self, url, page_count):
"""
Fetch dynamic page using DynamicDownloader and return selector object
"""
# construct search page url with the page count included
pages_url = url + '&page_num={}'.format(page_count)
self.logger.info("get_dynamic_page: {}".format(pages_url))
self.browser.load_page(pages_url)
return scrapy.Selector(text=self.browser.get_html_page())
我做错了什么?感谢您的帮助
编辑:
我得到以下例外情况:
File "/home/user/python3.6.1/lib/python3.6/site-packages/Twisted-17.9.0-py3.6-linux-x86_64.egg/twisted/internet/defer.py", line 1384, in _inlineCallbacks
result = result.throwExceptionIntoGenerator(g)
File "/home/user/python3.6.1/lib/python3.6/site-packages/Twisted-17.9.0-py3.6-linux-x86_64.egg/twisted/python/failure.py", line 408, in throwExceptionIntoGenerator
return g.throw(self.type, self.value, self.tb)
File "/home/user/python3.6.1/lib/python3.6/site-packages/scrapy/core/downloader/middleware.py", line 43, in process_request
defer.returnValue((yield download_func(request=request,spider=spider)))
twisted.web._newclient.ResponseNeverReceived: [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean f
ashion: Connection lost.>]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/swampblu/python3.6.1/lib/python3.6/site-packages/Twisted-17.9.0-py3.6-linux-x86_64.egg/twisted/internet/defer.py", line 1386, in _inlineCallbacks
result = g.send(result)
File "/home/swampblu/python3.6.1/lib/python3.6/site-packages/scrapy/core/downloader/middleware.py", line 66, in process_exception
spider=spider)
File "/home/swampblu/python3.6.1/lib/python3.6/site-packages/scrapy/downloadermiddlewares/retry.py", line 61, in process_exception
return self._retry(request, exception, spider)
File "/home/swampblu/python3.6.1/lib/python3.6/site-packages/scrapy/downloadermiddlewares/retry.py", line 71, in _retry
stats = spider.crawler.stats
AttributeError: 'SoundsEasySpider' object has no attribute 'crawler'
文件“/home/user/python3.6.1/lib/python3.6/site packages/Twisted-17.9.0-py3.6-linux-x86_64.egg/Twisted/internet/defer.py”,第1384行,在内联回调中
结果=结果。通过ExceptionToGenerator(g)
文件“/home/user/python3.6.1/lib/python3.6/site packages/Twisted-17.9.0-py3.6-linux-x86_64.egg/Twisted/python/failure.py”,第408行,位于ThroweExceptionToGenerator中
返回g.throw(self.type、self.value、self.tb)
文件“/home/user/python3.6.1/lib/python3.6/site packages/scrapy/core/downloader/middleware.py”,第43行,进程中请求
defer.returnValue((yield download_func(request=request,spider=spider)))
twisted.web.\u newclient.ResponseNeverReceived:[]
在处理上述异常期间,发生了另一个异常:
回溯(最近一次呼叫最后一次):
文件“/home/smargblu/python3.6.1/lib/python3.6/site packages/Twisted-17.9.0-py3.6-linux-x86_64.egg/Twisted/internet/defer.py”,第1386行,在内联回调中
结果=g.send(结果)
文件“/home/smargblu/python3.6.1/lib/python3.6/site packages/scrapy/core/downloader/middleware.py”,第66行,进程中异常
卡盘(卡盘)
文件“/home/smargblu/python3.6.1/lib/python3.6/site packages/scrapy/downloadermiddleware/retry.py”,第61行,进程中
返回self.\u重试(请求、异常、spider)
文件“/home/smargblu/python3.6.1/lib/python3.6/site packages/scrapy/downloadermiddleware/retry.py”,第71行,在“重试”中
stats=spider.crawler.stats
AttributeError:“SoundsEasySpider”对象没有属性“crawler”
问题在于防刮保护。服务器拒绝了请求。我已启用您是否可以包括导入部分?我怀疑错误的原因可能在那里。完成,但我猜如果导入部分有问题,那么它永远不会工作,但有时我得到了一些好的结果错误消息应该显示哪一行代码有问题-这就是为什么您应该始终对完整错误(回溯)提出疑问(作为文本,而不是屏幕截图)很好,福拉斯。我想我对进口报关单的看法是错误的。这可能是因为您的selenium浏览器加载页面的速度不够快,因此无法通过空页面或部分页面。您可以尝试添加延迟或更具体的等待指令,以确保在解析响应之前加载页面。制作一个class dude,class SoundseasySpider(scrapy.Spider)
,并遵循适当的scrapy约定