Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/selenium/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Scrapy/Selenium跳过了我的大部分iterable_Python_Selenium_Scrapy - Fatal编程技术网

Python Scrapy/Selenium跳过了我的大部分iterable

Python Scrapy/Selenium跳过了我的大部分iterable,python,selenium,scrapy,Python,Selenium,Scrapy,我正试图搜刮一个服装零售购物网站。出于某种原因,每当我运行下面的代码时,我都会从三个类别(如parse()中定义的第n个子项)中得到几个项目,并从li:n子项(5)中得到大量项目 有时会出现以下错误: 2017-01-09 20:33:30 [scrapy] ERROR: Spider error processing <GET http://www.example.com/jackets> (referer: http://www.example.com/) Traceback (

我正试图搜刮一个服装零售购物网站。出于某种原因,每当我运行下面的代码时,我都会从三个类别(如parse()中定义的第n个子项)中得到几个项目,并从li:n子项(5)中得到大量项目

有时会出现以下错误:

2017-01-09 20:33:30 [scrapy] ERROR: Spider error processing <GET http://www.example.com/jackets> (referer: http://www.example.com/)
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/site-packages/scrapy/utils/defer.py", line 102, in iter_errback
    yield next(it)
  File "/usr/local/lib/python2.7/site-packages/scrapy/spidermiddlewares/offsite.py", line 29, in process_spider_output
    for x in result:
  File "/usr/local/lib/python2.7/site-packages/scrapy/spidermiddlewares/referer.py", line 22, in <genexpr>
    return (_set_referer(r) for r in result or ())
  File "/usr/local/lib/python2.7/site-packages/scrapy/spidermiddlewares/urllength.py", line 37, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/usr/local/lib/python2.7/site-packages/scrapy/spidermiddlewares/depth.py", line 58, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/Users/BeardedMac/projects/thecurvyline-scraper/spiders/example.py", line 47, in parse_items
    price = node.find_element_by_css_selector('div.flex-wrapper--prod-details > div.pricing > div.price > div.standardprice').text
  File "/usr/local/lib/python2.7/site-packages/selenium/webdriver/remote/webelement.py", line 307, in find_element_by_css_selector
    return self.find_element(by=By.CSS_SELECTOR, value=css_selector)
  File "/usr/local/lib/python2.7/site-packages/selenium/webdriver/remote/webelement.py", line 511, in find_element
    {"using": by, "value": value})['value']
  File "/usr/local/lib/python2.7/site-packages/selenium/webdriver/remote/webelement.py", line 494, in _execute
    return self._parent.execute(command, params)
  File "/usr/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 236, in execute
    self.error_handler.check_response(response)
  File "/usr/local/lib/python2.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 192, in check_response
    raise exception_class(message, screen, stacktrace)
StaleElementReferenceException: Message: The element reference is stale. Either the element is no longer attached to the DOM or the page has been refreshed

简而言之,这里发生的事情是因为scrapy是并发的,而您的selenium实现不是并发的,所以您的selenium驱动程序会感到困惑。在爬网过程中,scrapy不断要求您的selenium驱动程序加载新的URL,而它仍在使用旧的URL

为了避免这种情况,您可以通过将设置设置为
1
来禁用spider中的并发性。例如,将其添加到您的
设置.py
文件中:

CONCURRENT_REQUESTS = 1
或者,如果希望将此设置限制为一个spider,请在spider中添加一个
自定义\u设置
条目:

class MySpider(scrapy.Spider):
    custom_settings = {'CONCURRENT_REQUESTS', 1}
如果您想保持并发性(这是一件非常好的事情),您可以尝试用更友好的python技术(如

class MySpider(scrapy.Spider):
    custom_settings = {'CONCURRENT_REQUESTS', 1}