Python 无法指向硒元素
我正在写一个webscraper,它从CSV文件中浏览链接列表,并从每个链接中获取详细信息。但是,我在指向一个元素时遇到了问题,该元素包含我正试图获取的电子邮件地址。如果你看[https://reality.idnes.cz/rk/detail/m-m-reality-holding-a-s/5a85b582a26e3a321d4f2700/]您可以看到有一个公司名称、地址、电话号码和一封电子邮件。电子邮件是我有问题的元素。如果你查看网站的代码,你会很快注意到电话号码和电子邮件都有相同的标题类“项目图标”。如果您查看我的代码,您会发现我试图使用第n个子级引用实际的类,但由于某些原因,该子级也不起作用。结果不会打印并放入CSV文件,因此找不到。这是我遇到问题的代码:Python 无法指向硒元素,python,python-3.x,selenium,selenium-webdriver,selenium-chromedriver,Python,Python 3.x,Selenium,Selenium Webdriver,Selenium Chromedriver,我正在写一个webscraper,它从CSV文件中浏览链接列表,并从每个链接中获取详细信息。但是,我在指向一个元素时遇到了问题,该元素包含我正试图获取的电子邮件地址。如果你看[https://reality.idnes.cz/rk/detail/m-m-reality-holding-a-s/5a85b582a26e3a321d4f2700/]您可以看到有一个公司名称、地址、电话号码和一封电子邮件。电子邮件是我有问题的元素。如果你查看网站的代码,你会很快注意到电话号码和电子邮件都有相同的标题类“
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as ec
from selenium.webdriver.chrome.options import Options
import time
import csv
with open('links.csv') as read:
reader = csv.reader(read)
link_list = list(reader)
with open('ScrapedContent.csv', 'w+', newline='') as write:
writer = csv.writer(write)
options = Options()
options.add_argument('--no-sandbox')
path = "/home/kali/Desktop/SRealityContentScraper/chromedriver"
driver = webdriver.Chrome(path)
wait = WebDriverWait(driver, 10)
for link in link_list:
driver.get(', '.join(link))
time.sleep(2)
information_list = wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "h1.b-annot__title.mb-5")))
title = driver.find_element_by_css_selector("h1.b-annot__title.mb-5")
information_list = wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "span.btn__text")))
offers = driver.find_element_by_css_selector("span.btn__text")
information_list = wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "p.font-sm")))
addresses = driver.find_element_by_css_selector("p.font-sm")
try:
information_list = wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "a.item-icon.measuring-data-layer")))
phone_number = driver.find_element_by_css_selector("a.item-icon.measuring-data-layer")
except Exception:
pass
try:
information_list = wait.until(ec.presence_of_element_located((By.CSS_SELECTOR,"span.items:nth-of-type(2) span.items__item a.item-icon")))
email = driver.find_element_by_css_selector("span.items:nth-of-type(2) span.items__item a.item-icon")
except Exception:
pass
try:
phone_number = phone_number.text
except Exception:
phone_number = " "
pass
try:
email = email.text
except Exception:
email = " "
pass
print(title.text, " ", offers.text, " ", addresses.text, " ", phone_number, " ", email)
writer.writerow([title.text, offers.text, addresses.text, phone_number, email])
driver.quit()
代码中存在try循环的原因是,有时链接列表中的某个页面缺少电子邮件或电话号码。所以我这样做,如果发生这种情况,信息的位置将被“”空字符串填充。然而,即使信息出现在页面上,也不会打印出来,这让我相信元素没有被正确找到。我删除了循环以测试输出,事实上,Selenium确认找不到元素。如果没有第n个孩子,刮板会刮取2个电话号码,而不是电话号码和一封电子邮件。据我所知,这是由于Selenium总是在页面上查找CSS选择器的第一个元素,即电话号码
我的问题是如何正确地指向元素,以便正确地刮取电子邮件?谢谢你的帮助!我开始感到绝望…我会尽力帮助您,我的解决方案是使用XPath作为选择器: 工作原理->“//a[./span[包含(@class,'icon-icon-email')]” //a->任何a [./span]->内部的子span [包含(@class,'icon-icon-email')]->包含该字符串的类
xpath_phone = "//a[./span[contains(@class, 'icon icon--phone')]]"
xpath_email = "//a[./span[contains(@class, 'icon icon--email')]]"
#example for email
try:
information_list = wait.until(ec.presence_of_element_located((By.XPATH, xpath_email)))
email = driver.find_element_by_xpath(xpath_email)
except Exception:
pass
打印电子邮件地址,即agorniak@mmreality.cz您必须为位于()的元素的可见性进行诱导,并且您可以使用以下任一项:
- 使用
:CSS\u选择器
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "span.items__item>a[href*='@']"))).text)
- 使用
:XPATH
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//span[@class='items__item']/a[contains(@href, '@')]"))).text)
- 注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC
为什么不尝试使用xpath来处理电子邮件?这个解决方案很有效!我知道XPath,但您似乎正在使用更高级的XPath类型。特别感谢您在我的例子中解释XPath是如何工作的!