Python Scrapy Selenium分页问题_Python_Selenium_Pagination_Scrapy

Python Scrapy Selenium分页问题

python selenium pagination scrapy

Python Scrapy Selenium分页问题,python,selenium,pagination,scrapy,Python,Selenium,Pagination,Scrapy,我无法确定如何在该站点上进行分页（签入start_URL）。它所做的是打开webdriver，成功地从第一个页面抓取数据，然后在加载第二个页面时关闭webdriver import scrapy from lxml.html import fromstring from ..items import PontsItems from selenium import webdriver class Names(scrapy.Spider): name = 'enseafr' d

我无法确定如何在该站点上进行分页（签入start_URL）。它所做的是打开webdriver，成功地从第一个页面抓取数据，然后在加载第二个页面时关闭webdriver

import scrapy
from lxml.html import fromstring
from ..items import PontsItems
from selenium import webdriver


class Names(scrapy.Spider):
    name = 'enseafr'

    download_delay = 5.0

    start_urls = ['https://www.ponts.org/fr/annuaire/recherche?result=1&annuaire_mode=standard&annuaire_as_no=&keyword=&PersonneNom=&PersonnePrenom=&DiplomePromo%5B%5D=2023&DiplomePromo%5B%5D=2022&DiplomePromo%5B%5D=2021&DiplomePromo%5B%5D=2020&DiplomePromo%5B%5D=2019&DiplomePromo%5B%5D=2018&DiplomePromo%5B%5D=2017&DiplomePromo%5B%5D=2016&DiplomePromo%5B%5D=2015&DiplomePromo%5B%5D=2014&DiplomePromo%5B%5D=2013&DiplomePromo%5B%5D=2012&DiplomePromo%5B%5D=2011&DiplomePromo%5B%5D=2010']

    def __init__(self):
        self.driver = webdriver.Chrome()

    def parse(self, response):
        items = PontsItems()
        self.driver.get(response.url)

        next = self.driver.find_element_by_xpath('//a[@class="next"]')
        #'//*[@id="zoneAnnuaire_layout"]/div[3]/div[2]/div[3]/div[11]/a[4]'
        while True:

            try:
                next.click()

                for item in response.xpath('//div[@class="single_desc"]'):
                    name = item.xpath('./div[@class="single_libel"]/a/text()').get().strip()
                    description = item.xpath('./div[@class="single_details"]/div/text()').get()
                    description = fromstring(description).text_content().strip()
                    year = item.xpath('./div[@class="single_details"]/div/b/text()').get()

                    items['name'] = name
                    items['description'] = description
                    items['year'] = year
                    yield items

            except:
                break

        self.driver.close()

我真的被这件事困扰了好几天。

我不知道如何使用PontItems（），但我可以使用一个空列表来显示如何使用以下内容返回数据。如果出现错误，它将返回当前列表，并在每次单击新页面旁边时追加到列表。您只有一个元素，所以请使用find_元素

items=[]
while True:
    try:
        next = self.driver.find_element_by_xpath('//a[@class="next"]')
        next.click()
        descs = self.driver.find_elements_by_xpath('//div[@class="single_desc"]')
        for item in descs :
            name = item.xpath('./div[@class="single_libel"]/a/text()').get().strip()
            description = item.xpath('./div[@class="single_details"]/div/text()').get()
            description = fromstring(description).text_content().strip()
            year = item.xpath('./div[@class="single_details"]/div/b/text()').get()
            items.append({'name':name,'description':description,'year':year})
    except:
        break
yield items

您好，出现了什么问题？我收到了以下消息：selenium.common.exceptions.StaleElementReferenceException:Message:stale element reference:element未附加到页面文档（会话信息：chrome=85.0.4183.102）如果您使用

Chrome

单击项目，那么您应该在

self.driver.page\u source

而不是

response

中搜索，或者简单地使用

self.driver.find\u element\u by\u xpath

而不是

response.xpath来搜索值。当我使用self.driver.find\u element\u by\u xpath时，我得到：TypeError:“WebElement”对象不是更不用说一些全新的东西了，比如[984:16892:0918/012923.411:ERROR:device\u event\u log\u impl.cc（208）][01:29:23.411]蓝牙：蓝牙适配器\u winrt.cc:1074获取默认适配器失败。你们能给我解释一下这个魔术吗？或者最好重写一下这个脚本，让它在某种程度上真正起作用吗？首先把下一个放到try catch中。你们在那个网站上测试过吗？对我来说，它唯一改变的是获取最后一项数据，并且在更改到第二页之前仍然关闭