如何使用Selenium和Python浏览网站内的页面

如何使用Selenium和Python浏览网站内的页面,python,selenium,xpath,pagination,webdriverwait,Python,Selenium,Xpath,Pagination,Webdriverwait,我正在使用Python和Selenium清理这个网站()。我的代码正在运行,但它目前只刮取第一个页面,我想遍历所有页面并刮取其中存在的所有视图,但它们以一种奇怪的方式处理分页,我如何遍历页面并逐个刮取它们 我的源代码: from selenium import webdriver from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_condi

我正在使用Python和Selenium清理这个网站()。我的代码正在运行,但它目前只刮取第一个页面,我想遍历所有页面并刮取其中存在的所有视图,但它们以一种奇怪的方式处理分页,我如何遍历页面并逐个刮取它们

我的源代码:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException, WebDriverException
import time

opt = webdriver.ChromeOptions()
opt.add_argument("--ignore-certificate-errors")
opt.add_argument("--start-maximized")

driver = webdriver.Chrome(executable_path=r"C:\Users\fit foodie\PycharmProjects\Selenium\Browser\chromedriver.exe", options=opt)

driver.get(url="http://rera.rajasthan.gov.in/")
search= driver.find_element_by_xpath("//*[@id='liSearch']/a").click()
proj_src=driver.find_element_by_xpath("//*[@id='liSearch']/ul/li[1]/a").click()

search_btn = driver.find_element_by_xpath('//*[@id="btn_SearchProjectSubmit"]').click()


def page():
    while True:
        try:
            driver.execute_script("return arguments[0].scrollIntoView(true);", WebDriverWait(driver, 20).until(
                EC.element_to_be_clickable((By.XPATH, "//*[@id='OuterProjectGrid']/div[4]/div[4]/a"))))
            driver.find_element_by_xpath("//*[@id='OuterProjectGrid']/div[4]/div[4]/a").click()
            print("Navigating to Next Page")
        except (TimeoutException, WebDriverException) as e:
            print("Last page reached")
            break
无法通过此文件分页

img: 这是

试试这个

def page():
    count = 0
    while True:
        try:
            count += 1
            driver.execute_script("return arguments[0].scrollIntoView(true);", WebDriverWait(driver, 20).until(
                EC.element_to_be_clickable((By.XPATH, "//*[@id='OuterProjectGrid']/div[4]/div[4]/a[1]"))))
            driver.find_element_by_xpath("//*[@id='OuterProjectGrid']/div[4]/div[4]/a["+str(count)+"]").click()
            print("Navigating to Next Page")
            time.sleep(5)
        except (TimeoutException, WebDriverException) as e:
            print("Last page reached")
            break

page()

对于
分页
使用以下
css
选择器,并在每次单击后提供延迟

def page():
    i=2
    while True:
        try:
            driver.execute_script("arguments[0].scrollIntoView();", WebDriverWait(driver, 20).until(
                EC.element_to_be_clickable((By.CSS_SELECTOR, "a[data-p='{}']".format(i)))))
            driver.find_element_by_css_selector("a[data-p='{}']".format(i)).click()

            print("Navigating to Next Page " + str(i))
            i=i+1
            time.sleep(1)
        except (TimeoutException, WebDriverException) as e:
            print("Last page reached")
            break

page()
输出:控制台快照


如果您的目标是从所有页面获取所有表数据,那么您也可以不使用selenium来实现。您可以尝试python
requests
module并使用postrequest

import requests
data={
    "PageSize" :1250,
    "page": 1
}
res=requests.post("http://rera.rajasthan.gov.in/Home/GetProjectsList",data=data).json()
for item in res['Data']['Items']:
    print(item['DistrictName'],item['ProjectName'],item['ProjectTypeName'],item['PromoterName'],item['ApplicationNo'],item['CertificateNo'])
输出所有这样的页面

Jaipur ATHARV APPARTMENT Group Housing SHP HOME LLP Revoked Project Revoked Project
Jaipur JVJ DREAM RESIDENCY Group Housing JVJ DREAM DEVELOPERS LLP RAJ-RERA-APP-P-2020-2214 (19/03/2020) RAJ/P/2020/1262 (29/05/2020)
Chittorgarh SHARDA ROYAL GREENS Plotted Development Choudhary Infraheight Private Limited RAJ-RERA-APP-P-2020-2201 (17/03/2020) RAJ/P/2020/1261 (29/05/2020)
Tonk GREEN CITY-A BLOCK Plotted Development SUN INDIA REALHOME LLP RAJ-RERA-APP-P-2020-2173 (04/03/2020) RAJ/P/2020/1260 (29/05/2020)
Ajmer Dream Homz Group Housing G S DREAMHOME LLP RAJ-RERA-APP-P-2020-2188 (13/03/2020) RAJ/P/2020/1259 (20/05/2020)
Jaipur KEDIA'S AMARA Group Housing KEDIA BUILDERS AND COLONIZERS PRIVATE LIMITED RAJ-RERA-APP-P-2020-2224 (13/05/2020) RAJ/P/2020/1258 (18/05/2020)
Jaipur Kuber Garden Group Housing PUNIT ESTATES PRIVATE LIMITED RAJ-RERA-APP-P-2020-2221 (29/04/2020) RAJ/P/2020/1257 (04/05/2020)
Kota SHUBH SAVERA Plotted Development SANTOSH  SAINI RAJ-RERA-APP-P-2020-2222 (29/04/2020) RAJ/P/2020/1256 (02/05/2020)
Udaipur MIRACLE Group Housing BHOOMISHIV BUILDERS LLP RAJ-RERA-APP-P-2020-2117 (15/02/2020) RAJ/P/2020/1255 (02/05/2020)
Jaipur NANDAN PRIME VILLAS Group Housing NARENDRA KUMAR AGARWAL RAJ-RERA-APP-P-2020-2184 (11/03/2020) RAJ/P/2020/1254 (28/04/2020)
Jaipur Akshat Kanota Estate-Phase 3 Group Housing AKSHAT APARTMENTS PRIVATE LIMITED RAJ-RERA-APP-P-2020-2052 (24/01/2020) RAJ/P/2020/1253 (20/04/2020)
Jaipur SHREE RADHA KRISHNA APARTMENT Group Housing GURUSAIKRIPA BUILDERS LLP RAJ-RERA-APP-P-2020-2213 (19/03/2020) RAJ/P/2020/1252 (16/04/2020)
Jodhpur Mangaldeep Darshan Group Housing Mangaldeep DaRSHAN RAJ-RERA-APP-P-2020-2186 (12/03/2020) RAJ/P/2020/1251 (16/04/2020)
Sri Ganganagar SHREENATH ENCLAVE Plotted Development ANANDAM HEIGHTS DEVELOPERS PRIVATE LIMITED RAJ-RERA-APP-P-2020-2144 (27/02/2020) RAJ/P/2020/1250 (16/04/2020)
Jaipur SHEKHAWAT CREST Group Housing M R S B INFRA PROJECT PRIVATE LIMITED RAJ-RERA-APP-P-2020-2181 (11/03/2020) RAJ/P/2020/1249 (12/04/2020)
Kota S.S. TIRUPATI TOWER Mixed (Residential And Commercial) S S TIRUPATI INFRAPROJECTS RAJ-RERA-APP-P-2020-2123 (18/02/2020) RAJ/P/2020/1248 (12/04/2020)
Jhalawar Green Villas Group Housing CHAUDHARY BHOORAMAL DEVELOPERS RAJ-RERA-APP-P-2020-2139 (25/02/2020) RAJ/P/2020/1247 (09/04/2020)
Ajmer Samriddhi's Dynasty Group Housing SANKALP REALMART PVT LTD RAJ-RERA-APP-P-2020-2073 (01/02/2020) RAJ/P/2020/1246 (27/03/2020)
Udaipur ARCHI'S  LOTUS PARK Group Housing ARCHI BUILDMART PRIVATE LIMITED RAJ-RERA-APP-P-2020-2171 (03/03/2020) RAJ/P/2020/1245 (27/03/2020)
Alwar KRISHAN KUNJ Plotted Development CHHOTE LAL MEENA RAJ-RERA-APP-P-2020-2067 (29/01/2020) RAJ/P/2020/1244 (27/03/2020)
Jodhpur SHANKHESHWAR NAGAR Plotted Development BALWANT  RAM RAJ-RERA-APP-P-2020-2095 (10/02/2020) RAJ/P/2020/1243 (27/03/2020)
Jodhpur VEERPRATAP INDUSTRIAL PARK Plotted Development VICTORIA INFRA HOLDINGS PRIVATE LIMITED RAJ-RERA-APP-P-2019-1699 (23/10/2019) RAJ/P/2020/1242 (27/03/2020)
Jaipur Ram Awas Group Housing Shubhashish Builders and Developers RAJ-RERA-APP-P-2020-2023 (17/01/2020) RAJ/P/2020/1241 (27/03/2020)
Sikar SHREE HANUMAN HEIGHTS Commercial MAHADEV BUILDERS AND DEVELOPERS RAJ-RERA-APP-P-2020-2166 (03/03/2020) RAJ/P/2020/1240 (27/03/2020)
Sikar MADHUVAN HOMES Group Housing RAJENDRA SINGH KHICHAR RAJ-RERA-APP-P-2020-2155 (02/03/2020) RAJ/P/2020/1239 (27/03/2020)
Baran SUMERU SOHAM Mixed (Residential And Commercial) SUMERU LIFE SPACE INDIA PRIVATE LIMITED RAJ-RERA-APP-P-2020-2172 (03/03/2020) RAJ/P/2020/1238 (27/03/2020)
Jodhpur ASHAPURNA ANMOL PHASE-I Group Housing ASHAPURNA BUILDCON LIMITED RAJ-RERA-APP-P-2020-2090 (07/02/2020) RAJ/P/2020/1237 (27/03/2020)
Sirohi AYODHYAPURAM SHEOGANJ Group Housing RAMBHADEEP BUILDCON PRIVATE LIMITED  RAJ-RERA-APP-P-2020-2111 (14/02/2020) RAJ/P/2020/1236 (27/03/2020)
Jaipur Bhavyaa Green Zenith Group Housing BHAVYAA GREEN BUILDERS RAJ-RERA-APP-P-2020-2163 (03/03/2020) RAJ/P/2020/1235 (20/03/2020)
Dholpur G.K. CITY Group Housing G K Builders RAJ-RERA-APP-P-2020-2065 (29/01/2020) RAJ/P/2020/1234 (20/03/2020)
Udaipur ARCHI'S PEARL PARADISE Group Housing ARCHI CIVIL CONSTRUCTION PRIVATE LIMITED RAJ-RERA-APP-P-2020-2142 (27/02/2020) RAJ/P/2020/1233 (20/03/2020)
Jaipur Stareef Suites 88 Group Housing Arihant Prime Buildtech LLP RAJ-RERA-APP-P-2020-2083 (05/02/2020) RAJ/P/2020/1232 (20/03/2020)
Jaipur HARITWAL CITY - D Plotted Development BHARURAM  JAT RAJ-RERA-APP-P-2020-2119 (17/02/2020) RAJ/P/2020/1231 (19/03/2020)
Jodhpur CMJAY LORDI PANDIT JI PACKAGE-10 JODHPUR Group Housing JODHPUR DEVELOPMENT AUTHORITY RAJ-RERA-APP-P-2020-2191 (13/03/2020) RAJ/P/2020/1230 (18/03/2020)
Jaipur Vedic Villas Phase- II Group Housing KEDIA BUILDERS AND COLONIZERS PRIVATE LIMITED RAJ-RERA-APP-P-2020-2169 (03/03/2020) RAJ/P/2020/1229 (12/03/2020)
Tonk SHREE GANESH VATIKA Plotted Development RAM KRISHAN COLONIZERS AND DEVELOPEPRS PRIVATE LIMITED RAJ-RERA-APP-P-2020-2158 (02/03/2020) RAJ/P/2020/1228 (11/03/2020)
Jaipur Vinayak Residency A+B+C (Extension) Plotted Development Vinayak Developers RAJ-RERA-APP-P-2020-2092 (10/02/2020) RAJ/P/2020/1226 (11/03/2020)
Jaipur NIRANJAN VIHAR EXTENSION Plotted Development SHRI GOVARDHAN ESTATES PRIVATE LIMITED  RAJ-RERA-APP-P-2020-2099 (11/02/2020) RAJ/P/2020/1225 (11/03/2020)
Jaipur SHREE PARSHVANATH ENCLAVE Group Housing PARSHVANATH INFRA PROJECT RAJ-RERA-APP-P-2020-2030 (21/01/2020) RAJ/P/2020/1224 (11/03/2020)
Jaipur Vrinda Gardens Phase V Group Housing Vista Housing RAJ-RERA-APP-P-2020-2097 (11/02/2020) RAJ/P/2020/1223 (06/03/2020)
Jaipur Ashiana Amantran Phase II Group Housing Ashiana Housing Limited RAJ-RERA-APP-P-2020-2125 (19/02/2020) RAJ/P/2020/1221 (06/03/2020)
Jaipur MANGLAM AANANDA PHASE III (B) Group Housing MANGLAM BUILD DEVELOPERS LIMITED RAJ-RERA-APP-P-2020-2152 (29/02/2020) RAJ/P/2020/1220 (06/03/2020)
Sirohi Karan Heights Group Housing Samdarshi Builders RAJ-RERA-APP-P-2020-2043 (23/01/2020) RAJ/P/2020/1219 (04/03/2020)
Alwar Krish City Centre Commercial Narmada Asbestos Pipes Private Limited RAJ-RERA-APP-P-2020-2021 (16/01/2020) RAJ/P/2020/1218 (04/03/2020)
Bhilwara OSTWAL EMPIRE-1 Plotted Development KULDEEP UMRAOSINGH OSTWAL RAJ-RERA-APP-P-2020-2040 (22/01/2020) RAJ/P/2020/1217 (04/03/2020)
Bhilwara OSTWAL EMPIRE-2 Plotted Development UMRAOSINGH PRITHVIRAJ OSTWAL RAJ-RERA-APP-P-2020-2039 (22/01/2020) RAJ/P/2020/1216 (04/03/2020)
Kota AKANSHA DEEP HEIGHTS Group Housing AKANSHA INFRA HOUSING PROJECTS RAJ-RERA-APP-P-2020-2122 (17/02/2020) RAJ/P/2020/1215 (04/03/2020)
Jodhpur NAKSHATRA Group Housing VISION ASSOCIATES RAJ-RERA-APP-P-2020-2070 (31/01/2020) RAJ/P/2020/1214 (03/03/2020)
Jodhpur CMJAY CHOKHA JODHPUR Group Housing JODHPUR DEVELOPMENT AUTHORITY RAJ-RERA-APP-P-2019-1514 (26/07/2019) RAJ/P/2020/1213 (02/03/2020)
Bikaner Shanti Nilay Group Housing Shanti Infrapromoters Private Limited RAJ-RERA-APP-P-2020-2036 (22/01/2020) RAJ/P/2020/1212 (02/03/2020)
Jaipur GOVINDAM TOWER Group Housing BRIJHARI HOMES LLP RAJ-RERA-APP-P-2020-2089 (07/02/2020) RAJ/P/2020/1208 (24/02/2020)
Jaipur Mukhya Mantri Rajya Sahayak Awasiya Karamchari Yojana Group Housing RAJASTHAN HOUSING BOARD RAJ-RERA-APP-P-2020-2126 (19/02/2020) RAJ/P/2020/1207 (21/02/2020)
Jaipur Ayush Market Plotted Development RAJASTHAN HOUSING BOARD RAJ-RERA-APP-P-2020-2128 (20/02/2020) RAJ/P/2020/1206 (21/02/2020)
Jaipur Kedia's The Oxygen Phase II Group Housing Radha Govind Colonizers RAJ-RERA-APP-P-2020-2103 (11/02/2020) RAJ/P/2020/1205 (19/02/2020)
Alwar Terra Aashray Group Housing Terra Realcon Private Limited RAJ-RERA-APP-P-2019-1530 (31/07/2019) RAJ/P/2020/1204 (19/02/2020)
Jodhpur EWS-335&LIG-153 Houses at Barli Scheme, Jodhpur under MGSY Group Housing RAJASTHAN HOUSING BOARD RAJ-RERA-APP-P-2020-2121 (17/02/2020) RAJ/P/2020/1203 (18/02/2020)
Jaipur GANESH VIHAR Plotted Development BIRDA RAM MEENA RAJ-RERA-APP-P-2019-1801 (24/12/2019) RAJ/P/2020/1202 (18/02/2020)
Jaipur SUMAN ENCLAVE H-BLOCK Plotted Development MS SAMRIDHI BUILDDEV PVT LTD RAJ-RERA-APP-P-2019-1802 (24/12/2019) RAJ/P/2020/1201 (18/02/2020)
Jaipur SOMYA SKY CREST Group Housing SOMYA BUILDHOME LLP RAJ-RERA-APP-P-2020-2062 (28/01/2020) RAJ/P/2020/1200 (17/02/2020)
Kota Neelkanth Residency Plotted Development Kailash Chand Malviya RAJ-RERA-APP-P-2019-1684 (09/10/2019) RAJ/P/2020/1199 (12/02/2020)
Jaipur VEDIC VILLAS PHASE-I Group Housing KEDIA BUILDERS AND COLONIZERS PRIVATE LIMITED RAJ-RERA-APP-P-2020-2072 (31/01/2020) RAJ/P/2020/1198 (12/02/2020)
Jaipur GOVINDAM PARADISE Group Housing BRIJHARI BUILDHOME LLP RAJ-RERA-APP-P-2020-2068 (29/01/2020) RAJ/P/2020/1197 (12/02/2020)

要使用和从网站内的搜索中刮取所有结果页面,您需要将
元素归纳为可点击()
,您可以使用以下方法:

  • 代码块:

    driver.get("http://rera.rajasthan.gov.in/ProjectSearch")
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//a[@class='dropdown-toggle' and contains(., 'Search')]"))).click()
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//a[@class='dropdown-toggle' and contains(., 'Search')]//following::ul[1]/li/a[text()='Project Search']"))).click()
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//input[@class='btn btn-primary']"))).click()
    while True:
        try:
            WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='ds4u-footer']//div[@class='ds4u-pager']//a")))
            WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//div[@class='ds4u-footer']//div[@class='ds4u-pager']//a[contains(@class, 'ds4u-selected')]//following::a[1]/span"))).click()
            print("Clicked for next page")
        except TimeoutException:
            print("No more pages to navigate")
            break
    driver.quit()
    
  • 控制台输出:

    Clicked for next page
    Clicked for next page
    Clicked for next page
    ...
    ...
    ...
    No more pages to navigate
    

它工作,但只有在执行第7页后的第7页它才直接跳到最后一页它工作,但主要是在第27页之后,它跳到最后一页,实际上,当看到前10页的xpath时,我遇到了,当第1-5页被显示时,它的xpath显示为[1],2,3,4,5,。。。。单击5后,所有页面都会显示xpath a[4],5重复显示,而不是增加,因此我们可能无法在这里使用xpath,应该使用css选择器,但仍然感谢@DebanjanB的帮助