Python 如何在Selenium中使用Xpath或Css选择器访问HTMLDOM属性_Python_Selenium_Xpath_Web Scraping

Python 如何在Selenium中使用Xpath或Css选择器访问HTMLDOM属性

python selenium xpath web-scraping

Python 如何在Selenium中使用Xpath或Css选择器访问HTMLDOM属性,python,selenium,xpath,web-scraping,Python,Selenium,Xpath,Web Scraping,硒版本3.141。Chrome驱动程序，Windows 10 你好,，目标是提取HTML DOM属性的值，特别是从中显示的每个图像的id、href和数据下载文件url（选择此网站纯粹是出于教育目的）。虽然还有其他方法可用于提取所有这些项，但目前，我使用的是find\u elements\u by\u xpath方法。然而，我欢迎有人提出我所不知道的更有效的方法从前面提到的网站，到目标元素的Xpath是 /html/body/main/section[2]/div/div/figure[X]/

硒版本3.141。Chrome驱动程序，Windows 10

你好,，目标是提取HTML DOM属性的值，特别是从中显示的每个图像的id、href和数据下载文件url（选择此网站纯粹是出于教育目的）。虽然还有其他方法可用于提取所有这些项，但目前，我使用的是

find\u elements\u by\u xpath

方法。然而，我欢迎有人提出我所不知道的更有效的方法

从前面提到的网站，到目标元素的Xpath是

/html/body/main/section[2]/div/div/figure[X]/div

大写字母X表示上述网站的图像标签，其值从1到50。每个图形都属于类

showcase\u content

我试了以下几行

titles_element = browser.find_elements_by_xpath("//div[@class='showcase__content']/a")
# List Comprehension to get the actual repo titles and not the selenium objects.
titles = [x.text for x in titles_element]

但是，

titles\u元素

没有提取dom属性。因此，

标题

产生了

[]

我也尝试了下面的方法，但它给了我一个错误

titles_element = browser.find_elements_by_xpath("//figure[1]/div[@class='showcase__content']//@data-download-file-url")

如果有人能对这个问题有所了解，我真的很感激

图1的DOM属性示例。所有属性均为粉红色。

现在我可以获得

标签，并获得图片的url：

结果:

from selenium import webdriver

driver = webdriver.Chrome()
driver.get("https://www.freepik.com/search?dates=any&format=search&page=1&query=Polygonal%20Human&sort=popular")
# result = WebDriverWait(driver,5).until(EC.element_located_to_be_selected(driver.find_elements_by_css_selector("[class='lzy landscape lazyload--done']"))) 
result = driver.find_elements_by_css_selector("[class='lzy landscape lazyload--done']") # the class always be "lzy landscape lazyload--done"
for i in result:
    print(i.get_attribute('src'))

或获取showcase\uuuu链接：

https://img.freepik.com/free-vector/innovative-medicine-abstract-composition-with-polygonal-wireframe-images-human-hand-carefully-holding-heart-vector-illustration_1284-30757.jpg?size=626&ext=jpg
https://img.freepik.com/free-vector/computer-generated-rendering-hand_41667-189.jpg?size=626&ext=jpg
https://img.freepik.com/free-vector/polygonal-wireframe-business-strategy-composition-with-glittering-images-human-hand-incandescent-lamp-with-text_1284-32265.jpg?size=626&ext=jpg
https://img.freepik.com/free-vector/particles-geometric-art-line-dot-engineering_31941-119.jpg?size=626&ext=jpg
........

现在我可以获得

标签，并获得图片的url：

结果:

from selenium import webdriver

driver = webdriver.Chrome()
driver.get("https://www.freepik.com/search?dates=any&format=search&page=1&query=Polygonal%20Human&sort=popular")
# result = WebDriverWait(driver,5).until(EC.element_located_to_be_selected(driver.find_elements_by_css_selector("[class='lzy landscape lazyload--done']"))) 
result = driver.find_elements_by_css_selector("[class='lzy landscape lazyload--done']") # the class always be "lzy landscape lazyload--done"
for i in result:
    print(i.get_attribute('src'))

或获取showcase\uuuu链接：

https://img.freepik.com/free-vector/innovative-medicine-abstract-composition-with-polygonal-wireframe-images-human-hand-carefully-holding-heart-vector-illustration_1284-30757.jpg?size=626&ext=jpg
https://img.freepik.com/free-vector/computer-generated-rendering-hand_41667-189.jpg?size=626&ext=jpg
https://img.freepik.com/free-vector/polygonal-wireframe-business-strategy-composition-with-glittering-images-human-hand-incandescent-lamp-with-text_1284-32265.jpg?size=626&ext=jpg
https://img.freepik.com/free-vector/particles-geometric-art-line-dot-engineering_31941-119.jpg?size=626&ext=jpg
........

尝试以下操作（在代码注释中进行解释）：

输出：

from selenium import webdriver
from time import sleep
driver = webdriver.Chrome()

driver.get("https://www.freepik.com/search?dates=any&format=search&page=1&query=Polygonal%20Human&sort=popular")

sleep(1)

# get the all "a" elements by xpath (class name), so you can use find_elements_by_class_name() instead if you want
titles_element = driver.find_elements_by_xpath("//a[@class='showcase__link']")


# loop through the elements and extract the id, href, and data-download-file-url attributes
for element in titles_element:
    id = element.get_attribute('id')
    href =  element.get_attribute('href')
    file_url= element.get_attribute('data-download-file-url')
    print (id, href, file_url)

尝试以下操作（在代码注释中进行解释）：

输出：

from selenium import webdriver
from time import sleep
driver = webdriver.Chrome()

driver.get("https://www.freepik.com/search?dates=any&format=search&page=1&query=Polygonal%20Human&sort=popular")

sleep(1)

# get the all "a" elements by xpath (class name), so you can use find_elements_by_class_name() instead if you want
titles_element = driver.find_elements_by_xpath("//a[@class='showcase__link']")


# loop through the elements and extract the id, href, and data-download-file-url attributes
for element in titles_element:
    id = element.get_attribute('id')
    href =  element.get_attribute('href')
    file_url= element.get_attribute('data-download-file-url')
    print (id, href, file_url)

您想获取您发布的图片中的元素吗？我想提取3个元素id、href和数据下载文件url。但是，如果您可以添加额外的示例来获取图片，那么您将非常受欢迎。是否要获取您发布的图片中的元素？我想提取3个元素id、href和数据下载文件url。但是，若你们可以添加额外的例子来获得图片，那个么你们是最受欢迎的。希望我能批准超过一个答案。你的建议应该得到额外的信任，因为你还包括图像的提取。我可以知道通过css选择器查找元素是否比通过xpath查找元素好吗？@balandongiv您已经读过。css选择器似乎比xpath快。在您的示例中，使用css选择器非常清楚。感谢@jizhaosama提供的信息。我会接受你的建议，因为你的建议更有效率。谢谢，我可以批准多个答案。你的建议应该得到额外的信任，因为你还包括图像的提取。我可以知道通过css选择器查找元素是否比通过xpath查找元素好吗？@balandongiv您已经读过。css选择器似乎比xpath快。在您的示例中，使用css选择器非常清楚。感谢@jizhaosama提供的信息。我会接受你的建议，因为你的建议更有效率。谢谢，我可以批准多个答案。你的建议应该得到额外的认可，因为你有额外的解释。更重要的是，您强调了所有这些属性实际上都位于类showcase__链接下（我以前不知道）。编辑：谢谢@Thaer，尽管你的建议非常好，我还是不得不接受另一个建议作为答案。希望我能批准多个答案。你的建议应该得到额外的认可，因为你有额外的解释。更重要的是，您强调了所有这些属性实际上都位于类showcase__链接下（我以前不知道）。编辑：谢谢@Thaer，我不得不接受另一个建议作为答案，尽管你的建议很棒。