Python scrapy&；selenium可刮除具有“加载更多”按钮的页面_Python_Selenium_Scrapy

Python scrapy&；selenium可刮除具有“加载更多”按钮的页面

python selenium scrapy

Python scrapy&；selenium可刮除具有“加载更多”按钮的页面,python,selenium,scrapy,Python,Selenium,Scrapy,我试图抓取一个页面，但在该页面中，我需要多次按下按钮来加载所有内容，这就是为什么我在解析和提取链接之前使用selenium的原因下面是错误，我做错了什么 2018-08-31 20:18:56 [twisted] CRITICAL: Traceback (most recent call last): File "d:\python-projects\lib\site-packages\twisted\internet\defer.py", line 1418, in _inlineCal

我试图抓取一个页面，但在该页面中，我需要多次按下按钮来加载所有内容，这就是为什么我在解析和提取链接之前使用selenium的原因

下面是错误，我做错了什么

2018-08-31 20:18:56 [twisted] CRITICAL:
Traceback (most recent call last):
  File "d:\python-projects\lib\site-packages\twisted\internet\defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
  File "d:\python-projects\lib\site-packages\scrapy\crawler.py", line 81, in crawl
    start_requests = iter(self.spider.start_requests())
TypeError: 'NoneType' object is not iterable

我的代码：

import scrapy
from scrapy.selector import Selector
from scrapy.spider import Spider
from scrapy.utils.markup import remove_tags
from selenium import webdriver


class Listings(Spider):
    name = "adver"
    base_url = 'https://www.test.com/xxxxx1'

    def start_requests(self):
        self.driver = webdriver.Firefox(executable_path=r'D:\python-projects\geckodriver.exe')
        self.driver.get(self.base_url)
        while True:
            load_content = self.driver.find_element_by_xpath('/html/body/div[5]/div[3]/div[1]/button')
            try:
                self.parse(driver.page_source)
                load_content.click()
            except:
                break
        self.driver.close()


    def parse(self, response):
        for link in response.css ("a.ad-title-link"):
            ad_link = link.css('a::attr(href)').extract_first()
            yield {'link': ad_link}

你需要

我建议您使用

是否应

start\u请求

返回某些内容？start\u请求应仅提取HTML并将其传递给self.parse（driver.page\u source）使用完整XPath而不使用类/id或任何唯一的方法访问按钮是非常糟糕的，因为树中添加/更改的任何元素都会导致异常。尝试检查load_按钮是否有唯一的类或id并使用它，您可以在这里提取一些html，我们可以提供帮助。