Python 如何使用SeleniumWebDriver从多个页面获取信息？_Python_Selenium_Selenium Webdriver_Selenium Chromedriver_Webdriverwait

Python 如何使用SeleniumWebDriver从多个页面获取信息？

python selenium selenium-webdriver

Python 如何使用SeleniumWebDriver从多个页面获取信息？,python,selenium,selenium-webdriver,selenium-chromedriver,webdriverwait,Python,Selenium,Selenium Webdriver,Selenium Chromedriver,Webdriverwait,我目前正在尝试从Bonhams网站（！）上提供的“Hong Kong Watches 2.0”拍卖的所有拍卖品（第1页至第33页）中获取标题。我不熟悉使用python和selenium，但我尝试使用下面的代码获得结果。这段代码给出了我想要的结果，但只针对第1页。然后，代码一次又一次地重复第1页的结果。似乎单击下一页的循环不起作用。谁能帮我修一下这个环吗下面您可以找到我使用的代码： from selenium import webdriver from selenium.webdriver.co

我目前正在尝试从Bonhams网站（！）上提供的“Hong Kong Watches 2.0”拍卖的所有拍卖品（第1页至第33页）中获取标题。我不熟悉使用python和selenium，但我尝试使用下面的代码获得结果。这段代码给出了我想要的结果，但只针对第1页。然后，代码一次又一次地重复第1页的结果。似乎单击下一页的循环不起作用。谁能帮我修一下这个环吗

下面您可以找到我使用的代码：

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions

driver=webdriver.Chrome()
driver.get('https://www.bonhams.com/auctions/25281/?category=results#/!')

while True:
    next_page_btn =driver.find_elements_by_xpath("//*[@id='lots']/div[2]/div[5]/div/a[10]/div")
    if len(next_page_btn) <1:
        print("no more pages left")
        break
    else:
        titles = driver.find_elements_by_xpath("//*[@class='firstLine']")
        titles = [title.text for title in titles]
        print(titles)

    element = WebDriverWait(driver,5).until(expected_conditions.element_to_be_clickable((By.ID,'lots')))
    driver.execute_script("return arguments[0].scrollIntoView();", element)
    element.click()

不需要

selenium

库来刮取数据。您还可以使用
请求
和
美化组
库获取所有页面数据

import requests from bs4 import BeautifulSoup headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:67.0) Gecko/20100101 Firefox/67.0", "Accept": "application/json" } page_num = 1 title_list = [] while True: url = 'https://www.bonhams.com/api/v1/lots/25281/?category=results&length=12&minimal=false&page={}'.format(page_num) print("===url===",url) response = requests.get(url,headers=headers).json() max_lot = response['max_lot'] last_iSaleLotNo = 0 titles = [] for lot in response['lots']: last_iSaleLotNo = lot['lot_id_combined'] title = BeautifulSoup(lot['styled_title'], 'lxml').find("div",{'class':'firstLine'}).text.strip() titles.append(title) title_list.append(titles) print("===titles===",titles) if int(max_lot) == int(last_iSaleLotNo): break page_num+=1 print(title_list)
第一页o/p：

['ROLEX. TWO SETS OF SHOWCASE DISPLAYS, MADE FOR ROLEX RETAILERS IN 1970s', 'ROLEX. TWO SETS OF RARE SHOWCASE DISPLAYS, MADE FOR ROLEX RETAILERS IN 1980s', 'PATEK PHILIPPE. A SET OF THREE RARE LIMOGES PORCELAIN AND ENAMEL DISHES', 'Bvlgari/MAUBOUSSIN. TWO SETS OF CUFFLINKS', 'BOUCHERON/MONTBLANC. TWO SETS OF CUFFLINKS', 'PATEK PHILIPPE. TWO SETS OF CUFFLINKS', 'Jaeger-LeCoultre. A Gilt Brass Table Clock With 8-Days Power Reserve and Alarm', 'Cartier & LeCoultre. A group of three gilt brass table clocks (Alarm/Alarm Worldtime/Engraved dial)', 'Jaeger-LeCoultre. A Gilt Brass Table Clock With 8-Days Power Reserve', 'Reuge. A Gold Plated Musical Automaton Open Face Pocket Watch with Alarm', 'Imhof. An Attractive Gilt Brass Table Clock With Polychrome Enamel Dial', 'Vacheron Constantin. A Large Polished Metal Perpetual Calendar Wall Clock']
打开browser network选项卡并单击next按钮，您将看到JSON响应数据，如
不需要
selenium
库来刮取数据。您还可以使用
请求
和
美化组
库获取所有页面数据

import requests from bs4 import BeautifulSoup headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:67.0) Gecko/20100101 Firefox/67.0", "Accept": "application/json" } page_num = 1 title_list = [] while True: url = 'https://www.bonhams.com/api/v1/lots/25281/?category=results&length=12&minimal=false&page={}'.format(page_num) print("===url===",url) response = requests.get(url,headers=headers).json() max_lot = response['max_lot'] last_iSaleLotNo = 0 titles = [] for lot in response['lots']: last_iSaleLotNo = lot['lot_id_combined'] title = BeautifulSoup(lot['styled_title'], 'lxml').find("div",{'class':'firstLine'}).text.strip() titles.append(title) title_list.append(titles) print("===titles===",titles) if int(max_lot) == int(last_iSaleLotNo): break page_num+=1 print(title_list)
第一页o/p：

['ROLEX. TWO SETS OF SHOWCASE DISPLAYS, MADE FOR ROLEX RETAILERS IN 1970s', 'ROLEX. TWO SETS OF RARE SHOWCASE DISPLAYS, MADE FOR ROLEX RETAILERS IN 1980s', 'PATEK PHILIPPE. A SET OF THREE RARE LIMOGES PORCELAIN AND ENAMEL DISHES', 'Bvlgari/MAUBOUSSIN. TWO SETS OF CUFFLINKS', 'BOUCHERON/MONTBLANC. TWO SETS OF CUFFLINKS', 'PATEK PHILIPPE. TWO SETS OF CUFFLINKS', 'Jaeger-LeCoultre. A Gilt Brass Table Clock With 8-Days Power Reserve and Alarm', 'Cartier & LeCoultre. A group of three gilt brass table clocks (Alarm/Alarm Worldtime/Engraved dial)', 'Jaeger-LeCoultre. A Gilt Brass Table Clock With 8-Days Power Reserve', 'Reuge. A Gold Plated Musical Automaton Open Face Pocket Watch with Alarm', 'Imhof. An Attractive Gilt Brass Table Clock With Polychrome Enamel Dial', 'Vacheron Constantin. A Large Polished Metal Perpetual Calendar Wall Clock']
打开browser network选项卡并单击next按钮，您将看到JSON响应数据，如
嘿，巴拉特克，谢谢你的快速回复。然而，当我将此代码复制到终端时，它不会给我任何输出。我在代码末尾得到了这个：
。。。标题。附加（标题）。。。title\u list.如果max\u lot==last\u iSaleLotNo，则追加（titles）：断页\u num+=1次打印（title\u list）
它什么也不做。这和缩进有什么关系吗？或者我应该更改我的用户代理代码行吗？正如我所说，我是Python新手，如果这是一个相当愚蠢的问题，我深表歉意。@B.vandenBoomen用一条打印语句更新了我的答案，现在你会看到，总共有32页，它正在一个接一个地获取数据。打印语句有效。然而，我仍然坚持同样的问题。它只打印第一页的标题，并一直打印这些标题。我想让它跳到下一页，从那页抓取标题，然后跳到下一页。我不明白有人会在没有任何评论或解释的情况下否决我的答案。如果不同意我的回答，请在这里发表评论。我不知道是谁做的。我对你明确的回答很满意。谢谢嘿，巴拉特克，谢谢你的快速回复。然而，当我将此代码复制到终端时，它不会给我任何输出。我在代码末尾得到了这个：
。。。标题。附加（标题）。。。title\u list.如果max\u lot==last\u iSaleLotNo，则追加（titles）：断页\u num+=1次打印（title\u list）
它什么也不做。这和缩进有什么关系吗？或者我应该更改我的用户代理代码行吗？正如我所说，我是Python新手，如果这是一个相当愚蠢的问题，我深表歉意。@B.vandenBoomen用一条打印语句更新了我的答案，现在你会看到，总共有32页，它正在一个接一个地获取数据。打印语句有效。然而，我仍然坚持同样的问题。它只打印第一页的标题，并一直打印这些标题。我想让它跳到下一页，从那页抓取标题，然后跳到下一页。我不明白有人会在没有任何评论或解释的情况下否决我的答案。如果不同意我的回答，请在这里发表评论。我不知道是谁做的。我对你明确的回答很满意。谢谢