Python Selenium重复打印相同的信息
您好,我正在尝试从一个在其“dl”标签中包含数据的网站中获取一些数据。以下是网站结构的外观Python Selenium重复打印相同的信息,python,python-3.x,selenium,Python,Python 3.x,Selenium,您好,我正在尝试从一个在其“dl”标签中包含数据的网站中获取一些数据。以下是网站结构的外观 <div class="ecord-overview col-md-5"> <h2><span itemprop="name">Donald Duck</span></h2> dl class="row"> </dd> <dt class="col-md-4">Email</dt> <dd clas
<div class="ecord-overview col-md-5">
<h2><span itemprop="name">Donald Duck</span></h2>
dl class="row">
</dd>
<dt class="col-md-4">Email</dt>
<dd class="col-md-8">myemail.com</dd>
</dl>
<div class="ecord-overview col-md-5">
<h2><span itemprop="name">Mickey mouse</span></h2>
dl class="row">
</dd>
<dt class="col-md-4">Email</dt>
<dd class="col-md-8">youremail.com</dd>
</dl>
... data goes on but value differs
因此,每当我执行程序时,它都会为每个名称和数据打印相同的dl内容,就像这样
donald duck
Email
myemail.com
-------------
mickey mouse
Email
myemail.com
我已经尝试过将dl
放入for循环,就像我打印name一样,但它也打印了我不想要的其他东西
我能做什么?
驱动程序。通过标签查找元素名称('dl')
将始终返回第一个匹配元素。您需要使用元素
来定位
s
或者直接定位这些元素
for element in driver.find_elements_by_css_selector('.ThatsThem-record-overview dl'):
print(element.text)
driver.find\u element\u by_tag\u name('dl')
将始终返回第一个匹配的元素。您需要使用元素
来定位
s
或者直接定位这些元素
for element in driver.find_elements_by_css_selector('.ThatsThem-record-overview dl'):
print(element.text)
看来你很接近。使用类
记录概述
应该已获取您所需的所有数据。但是,最好通过遍历子标记来定位单个名称和电子邮件。此外,它将优化您的程序性能
因此,理想情况下,您需要诱导WebDriverWait以实现所有元素的可见性(),并且您可以使用以下任一选项:
- 使用
:CSS\u选择器
names[] = [my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div.record-overview>h2>span")))] emails[] = [my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div.record-overview dl.row dd")))] for name, email in zip(names, emails): print("{} Email is {}".format(name, email))
- 使用
:XPATH
names[] = [my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[contains(@class, 'record-overview')]/h2/span")))] emails[] = [my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[contains(@class, 'record-overview')]//dl[@class='row']//dd")))] for name, email in zip(names, emails): print("{} Email is {}".format(name, email))
- 注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC
记录概述
应该已获取您所需的所有数据。但是,最好通过遍历子标记来定位单个名称和电子邮件。此外,它将优化您的程序性能
因此,理想情况下,您需要诱导WebDriverWait以实现所有元素的可见性(),并且您可以使用以下任一选项:
- 使用
:CSS\u选择器
names[] = [my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div.record-overview>h2>span")))] emails[] = [my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div.record-overview dl.row dd")))] for name, email in zip(names, emails): print("{} Email is {}".format(name, email))
- 使用
:XPATH
names[] = [my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[contains(@class, 'record-overview')]/h2/span")))] emails[] = [my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[contains(@class, 'record-overview')]//dl[@class='row']//dd")))] for name, email in zip(names, emails): print("{} Email is {}".format(name, email))
- 注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC