使用Selenium循环浏览家庭列表_Selenium_Pagination

使用Selenium循环浏览家庭列表

selenium pagination

使用Selenium循环浏览家庭列表,selenium,pagination,Selenium,Pagination,我正试图使用Selenium在一个HomeListings站点上循环，但在让while循环遍历页面时遇到了麻烦。我不知道如何找到该站点的列表总页数，但在使用20页（max_page_count=20）进行while循环时测试了这个，因为我知道至少有20页的列表。我正在考虑两（2）种迭代页面的方法，但每种方法都处于不同的阶段：使用：{}.format（page）并创建一个计数器来遍历页面使用Selenium的click功能单击所提供图片中显示的页面元素我已经检查并知道我能够从页面上删除pr

我正试图使用Selenium在一个HomeListings站点上循环，但在让while循环遍历页面时遇到了麻烦。我不知道如何找到该站点的列表总页数，但在使用20页（max_page_count=20）进行while循环时测试了这个，因为我知道至少有20页的列表。我正在考虑两（2）种迭代页面的方法，但每种方法都处于不同的阶段：

使用：{}.format（page）并创建一个计数器来遍历页面

使用Selenium的click功能单击所提供图片中显示的页面元素

我已经检查并知道我能够从页面上删除price元素，但是我发现click函数不起作用

这是我的密码：

driver_location = 'C:/Users/oefel/Downloads/geckodriver-v0.26.0-win64'
os.environ['webdriver.firefox.driver'] = driver_location 
driver = webdriver.Firefox(driver_location)
driver.get("https://www.tinyhomebuilders.com/tiny-house-marketplace/search")
driver.implicitly_wait(50)
driver.maximize_window()

tiny_house_price = []

page_count = 0
max_page_count = 20

while (page_count < max_page_count):
        html_soup = BeautifulSoup(driver.page_source, 'lxml')

        scraped_price = driver.find_elements_by_css_selector("div.card-body > div.price")
        for price in scraped_price:
                tiny_house_price.append(price.text)
        print(tiny_house_price)

        page = driver.find_elements_by_css_selector('.pagination > li > a.href').click()

        page_count += 1

driver\u location='C:/Users/oefel/Downloads/geckodriver-v0.26.0-win64'
os.environ['webdriver.firefox.driver']=driver\u位置
driver=webdriver.Firefox（驱动程序位置）
驱动程序。获取（“https://www.tinyhomebuilders.com/tiny-house-marketplace/search")
驱动程序。隐式等待（50）
驱动程序。最大化_窗口（）
小房子的价格=[]
页数=0
最大页数=20
而（页数<最大页数）：
html\u soup=BeautifulSoup（driver.page\u源代码'lxml'）
scraped_price=驱动程序。通过_css_选择器（“div.card-body>div.price”）查找_元素
对于刮板价格中的价格：
tiny_house_price.append（price.text）
打印（小房子价格）
page=driver。通过_css_选择器（'.pagination>li>a.href'）查找_元素。单击（）
页数+=1

我将非常感谢任何帮助

谢谢大家!

在这种情况下，您不需要selenium来获取文本。您可以通过向页面发出get请求并从中获取html来获取页面文本。页码是作为URL参数传递的，您只需在其中循环，即可获得所需的输出。样本如下：

import requests
from bs4 import BeautifulSoup

tiny_house_price = []

page_count = 1
max_page_count = 20

while (page_count < max_page_count):
    r = requests.get('https://www.tinyhomebuilders.com/tiny-house-marketplace/search?page={}'.format(page_count))
    html_soup = BeautifulSoup(r.text, 'html.parser')
    scraped_price = html_soup.select("div.card-body > div.price")
    for price in scraped_price:
        tiny_house_price.append(price.text.strip())
    print(tiny_house_price)

    page_count += 1

导入请求
从bs4导入BeautifulSoup
小房子的价格=[]
页数=1
最大页数=20
而（页数<最大页数）：
r=请求。获取（'https://www.tinyhomebuilders.com/tiny-house-marketplace/search?page={}.格式（页数）
html\u soup=BeautifulSoup（r.text'html.parser'）
scraped_price=html_soup。选择（“div.card-body>div.price”）
对于刮板价格中的价格：
tiny\u house\u price.append（price.text.strip（））
打印（小房子价格）
页数+=1

由于选择器不正确，案例中的单击无效。您必须通过链接文本找到元素并单击。链接文本将是页面计数

如果您也想使用selenium，可以使用上面使用的相同页面url参数逻辑。您只需要使用selenium打开web页面，获取源代码，并在获取html后导航到新页面

import requests
from bs4 import BeautifulSoup

tiny_house_price = []

page_count = 1
max_page_count = 20

while (page_count < max_page_count):
    r = requests.get('https://www.tinyhomebuilders.com/tiny-house-marketplace/search?page={}'.format(page_count))
    html_soup = BeautifulSoup(r.text, 'html.parser')
    scraped_price = html_soup.select("div.card-body > div.price")
    for price in scraped_price:
        tiny_house_price.append(price.text.strip())
    print(tiny_house_price)

    page_count += 1

导入请求
从bs4导入BeautifulSoup
小房子的价格=[]
页数=1
最大页数=20
而（页数<最大页数）：
r=请求。获取（'https://www.tinyhomebuilders.com/tiny-house-marketplace/search?page={}.格式（页数）
html\u soup=BeautifulSoup（r.text'html.parser'）
scraped_price=html_soup。选择（“div.card-body>div.price”）
对于刮板价格中的价格：
tiny\u house\u price.append（price.text.strip（））
打印（小房子价格）
页数+=1

由于选择器不正确，案例中的单击无效。您必须通过链接文本找到元素并单击。链接文本将是页面计数

如果您也想使用selenium，可以使用上面使用的相同页面url参数逻辑。您只需要使用selenium打开web页面，获取源代码，并在获取html后导航到新页面