在python中使用selenium导航分页_Python_Selenium_Selenium Webdriver_Web Scraping

在python中使用selenium导航分页

python selenium selenium-webdriver web-scraping

在python中使用selenium导航分页,python,selenium,selenium-webdriver,web-scraping,Python,Selenium,Selenium Webdriver,Web Scraping,我正在使用Python和Selenium抓取这个网站。我有工作的代码，但它目前只刮第一页，我想迭代所有的网页，刮他们所有的，但他们处理分页以一种奇怪的方式，我如何通过网页，刮他们一个接一个分页HTML： <div class="pagination"> <a href="/PlanningGIS/LLPG/WeeklyList/41826123,1" title="Go to first page">First</a> <a href=

我正在使用Python和Selenium抓取这个网站。我有工作的代码，但它目前只刮第一页，我想迭代所有的网页，刮他们所有的，但他们处理分页以一种奇怪的方式，我如何通过网页，刮他们一个接一个

分页HTML：

<div class="pagination">
    <a href="/PlanningGIS/LLPG/WeeklyList/41826123,1" title="Go to first page">First</a>
    <a href="/PlanningGIS/LLPG/WeeklyList/41826123,1" title="Go to previous page">Prev</a>
    <a href="/PlanningGIS/LLPG/WeeklyList/41826123,1" title="Go to page 1">1</a>
    <span class="current">2</span>
    <a href="/PlanningGIS/LLPG/WeeklyList/41826123,3" title="Go to page 3">3</a>
    <a href="/PlanningGIS/LLPG/WeeklyList/41826123,4" title="Go to page 4">4</a>
    <a href="/PlanningGIS/LLPG/WeeklyList/41826123,3" title="Go to next page">Next</a>
    <a href="/PlanningGIS/LLPG/WeeklyList/41826123,4" title="Go to last page">Last</a>
</div>

首先使用

ins.get('https://services.wiltshire.gov.uk/PlanningGIS/LLPG/WeeklyList/10702380,1')
ins.find_element_by_class_name("pagination")
source = BeautifulSoup(ins.page_source)
div = source.find_all('div', {'class':'pagination'})
all_as = div[0].find_all('a')
total = 0

for i in range(len(all_as)):
    if 'Next' in all_as[i].text:
        total = all_as[i-1].text
        break

现在只需在范围内循环

for i in range(total):
 ins.get('https://services.wiltshire.gov.uk/PlanningGIS/LLPG/WeeklyList/10702380,{}'.format(count))

继续增加计数，获取页面的源代码，然后获取数据。

注意：当点击从一个页面转到另一个页面时，不要忘记睡眠。在继续自动化任何场景之前，一定要写下执行场景的手动步骤。您想要的（我从问题中了解到）的手动步骤是-

1）前往现场-

2）选择第一周选项

3）单击搜索

4）从每页获取数据
5）再次加载url
6）选择第二周选项
7）单击搜索
8）从每页获取数据
。。等等
您有一个循环来选择不同的周，但是在周选项的每个循环迭代中，您还需要包括一个循环来迭代所有页面。因为您没有这样做，所以代码只返回第一页的数据
另一个问题是如何定位“下一步”按钮-

driver.find_element_by_xpath('//*[@id="form1"]/div[3]/a[4]').click()
您正在选择第四个
- 我是用下面的代码实现的- number_of_pages = int(driver.find_element_by_xpath("//a[contains(text(),'Next')]/preceding-sibling::a[1]").text) 现在，一旦您将页数设置为number\u of \u pages ，您只需单击“Next”（下一步）按钮number\u of \u pages-1次您的main 函数的最终代码- def main(): all_data = [] select = Select(driver.find_element_by_xpath("//select[@class='formitem' and @id='selWeek']")) list_options = select.options for item in range(len(list_options)): select = Select(driver.find_element_by_xpath("//select[@class='formitem' and @id='selWeek']")) select.select_by_index(str(item)) driver.find_element_by_css_selector("input.formbutton#csbtnSearch").click() number_of_pages = int(driver.find_element_by_xpath("//a[contains(text(),'Next')]/preceding-sibling::a[1]").text) for j in range(number_of_pages - 1): all_data.extend(getData()) driver.find_element_by_xpath("//a[contains(text(),'Next')]").click() time.sleep(1) driver.get(url) with open( 'wiltshire.json', 'w+' ) as f: json.dump( all_data, f ) driver.quit() 下面的方法对我很有效 driver.find_element_by_link_text("3").click() driver.find_element_by_link_text("4").click() .... driver.find_element_by_link_text("Next").click() 你试过使用javascript向下滚动吗？处理分页的代码块在哪里？@DebanjanB这正是我需要帮助的地方，因为前三个锚定标记始终是你的页面，我不知道如何遍历它，因为每个页面都有不同的页面谢谢你这么做，并提出了完美的感官问题这是做什么的“前面的兄弟姐妹：：a[1]”是否正在获取数字13@AbdulJamac是的，我在更新的答案中也提到了这一点。那么现在你有13个按钮了吗？它点击下一步按钮13次，每次点击按钮时，它都会从_pages@AbdulJamac假设有13个页面，内部循环将执行12次。由于第一个页面已经加载，我们只需点击下一步按钮12次（13-1次）。 def main(): all_data = [] select = Select(driver.find_element_by_xpath("//select[@class='formitem' and @id='selWeek']")) list_options = select.options for item in range(len(list_options)): select = Select(driver.find_element_by_xpath("//select[@class='formitem' and @id='selWeek']")) select.select_by_index(str(item)) driver.find_element_by_css_selector("input.formbutton#csbtnSearch").click() number_of_pages = int(driver.find_element_by_xpath("//a[contains(text(),'Next')]/preceding-sibling::a[1]").text) for j in range(number_of_pages - 1): all_data.extend(getData()) driver.find_element_by_xpath("//a[contains(text(),'Next')]").click() time.sleep(1) driver.get(url) with open( 'wiltshire.json', 'w+' ) as f: json.dump( all_data, f ) driver.quit() driver.find_element_by_link_text("3").click() driver.find_element_by_link_text("4").click() .... driver.find_element_by_link_text("Next").click()