通过Selenium保持循环进行Selenium导航(python)
我刚开始使用selenium从网页上刮表。因此,我使用selenium实现了网页导航。但是,当我运行代码时,结果会不断循环。我很确定我把代码写错了。我应该如何修复代码以使导航selenium正常工作通过Selenium保持循环进行Selenium导航(python),selenium,pagination,Selenium,Pagination,我刚开始使用selenium从网页上刮表。因此,我使用selenium实现了网页导航。但是,当我运行代码时,结果会不断循环。我很确定我把代码写错了。我应该如何修复代码以使导航selenium正常工作 import requests import csv from bs4 import BeautifulSoup as bs from selenium import webdriver browser=webdriver.Chrome() browser.get('htt
import requests
import csv
from bs4 import BeautifulSoup as bs
from selenium import webdriver
browser=webdriver.Chrome()
browser.get('https://dir.businessworld.com.my/15/posts/16-Computers-The-Internet')
# url = requests.get("https://dir.businessworld.com.my/15/posts/16-Computers-The-Internet/")
soup=bs(browser.page_source)
filename = "C:/Users/User/Desktop/test.csv"
csv_writer = csv.writer(open(filename, 'w'))
pages_remaining = True
while pages_remaining:
for tr in soup.find_all("tr"):
data = []
# for headers ( entered only once - the first time - )
for th in tr.find_all("th"):
data.append(th.text)
if data:
print("Inserting headers : {}".format(','.join(data)))
csv_writer.writerow(data)
continue
for td in tr.find_all("td"):
if td.a:
data.append(td.a.text.strip())
else:
data.append(td.text.strip())
if data:
print("Inserting data: {}".format(','.join(data)))
csv_writer.writerow(data)
try:
#Checks if there are more pages with links
next_link = driver.find_element_by_xpath('//*[@id="content"]/div[3]/table/tbody/tr/td[2]/table/tbody/tr/td[6]/a ]')
next_link.click()
time.sleep(30)
except NoSuchElementException:
rows_remaining = False
检查页面上是否存在“下一步”按钮,然后单击“其他”退出while循环
if len(browser.find_elements_by_xpath("//a[contains(.,'Next')]"))>0:
browser.find_element_by_xpath("//a[contains(.,'Next')]").click()
else:
break
无需使用时间。请改用WebDriverWait
代码:
它不起作用。我尝试为“Next”按钮复制xpath,结果是//*[@id=content]/div[3]/table/tbody/tr/td[2]/table/tbody/tr/td[4]/a。因此,我用它替换了//a[contains.,'Next']。而且它也不起作用。我应该如何更改xpath以使导航selenium正常工作?请复制整个代码。我已经测试过它,单击“下一步”按钮。您只有2页数据,对吗?您知道为什么它一直循环使用scrape page 1吗?没有刮去其他页面?你在它正常工作之前就说了。但是它正在点击网页上的每个下一步按钮。我没有检查你的数据。我只是检查了数据。然而,多亏了您,分页工作正常,selenium即使在导航到另一个页面后也会继续抓取同一页面
import csv
from bs4 import BeautifulSoup as bs
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
browser=webdriver.Chrome()
browser.get('https://dir.businessworld.com.my/15/posts/16-Computers-The-Internet')
WebDriverWait(browser, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "table.postlisting")))
soup=bs(browser.page_source)
filename = "C:/Users/User/Desktop/test.csv"
csv_writer = csv.writer(open(filename, 'w'))
pages_remaining = True
while pages_remaining:
WebDriverWait(browser,10).until(EC.visibility_of_element_located((By.CSS_SELECTOR,"table.postlisting")))
for tr in soup.find_all("tr"):
data = []
# for headers ( entered only once - the first time - )
for th in tr.find_all("th"):
data.append(th.text)
if data:
print("Inserting headers : {}".format(','.join(data)))
csv_writer.writerow(data)
continue
for td in tr.find_all("td"):
if td.a:
data.append(td.a.text.strip())
else:
data.append(td.text.strip())
if data:
print("Inserting data: {}".format(','.join(data)))
csv_writer.writerow(data)
if len(browser.find_elements_by_xpath("//a[contains(.,'Next')]"))>0:
browser.find_element_by_xpath("//a[contains(.,'Next')]").click()
else:
break