Python 含硒产品的刮色_Python_Selenium_Web Scraping

Python 含硒产品的刮色

python selenium web-scraping

Python 含硒产品的刮色,python,selenium,web-scraping,Python,Selenium,Web Scraping,product = driver.find_elements_by_partial_link_text(keyword) for item in product: if item.parent.parent.find("p") == wanted_color: item.get_attribute("href") products = [] for element in anchors: if element.find_element_by_xpath('..'

product = driver.find_elements_by_partial_link_text(keyword)
for item in product:
    if item.parent.parent.find("p") == wanted_color:
        item.get_attribute("href")

products = []
for element in anchors:
    if element.find_element_by_xpath('..').tag_name == 'p':  # or 'h1'
        text = element.text
        products.append([element, text])

链接到完整的源代码：

product = driver.find_elements_by_partial_link_text(keyword)
for item in product:
    if item.parent.parent.find("p") == wanted_color:
        item.get_attribute("href")

products = []
for element in anchors:
    if element.find_element_by_xpath('..').tag_name == 'p':  # or 'h1'
        text = element.text
        products.append([element, text])

product = driver.find_elements_by_partial_link_text(keyword)
for item in product:
    if item.parent.parent.find("p") == wanted_color:
        item.get_attribute("href")

products = []
for element in anchors:
    if element.find_element_by_xpath('..').tag_name == 'p':  # or 'h1'
        text = element.text
        products.append([element, text])

试图从网站上删除产品元素及其颜色。我已经可以提取产品的名称并单击它，但是我希望能够提取其中包含特定关键字的所有产品，然后单击我想要的颜色的产品。感谢您的帮助

product = driver.find_elements_by_partial_link_text(keyword)
for item in product:
    if item.parent.parent.find("p") == wanted_color:
        item.get_attribute("href")

products = []
for element in anchors:
    if element.find_element_by_xpath('..').tag_name == 'p':  # or 'h1'
        text = element.text
        products.append([element, text])

编辑：我试过的

product = driver.find_elements_by_partial_link_text(keyword)
for item in product:
    if item.parent.parent.find("p") == wanted_color:
        item.get_attribute("href")

products = []
for element in anchors:
    if element.find_element_by_xpath('..').tag_name == 'p':  # or 'h1'
        text = element.text
        products.append([element, text])

错误：

product = driver.find_elements_by_partial_link_text(keyword)
for item in product:
    if item.parent.parent.find("p") == wanted_color:
        item.get_attribute("href")

Traceback (most recent call last):   File "C:/Users/B/PycharmProjects/BasicSelenium/test.py", line 17, in <module>
if item.parent.parent.find("p") == color:  AttributeError: 'WebDriver' object has no attribute 'parent'

products = []
for element in anchors:
    if element.find_element_by_xpath('..').tag_name == 'p':  # or 'h1'
        text = element.text
        products.append([element, text])

Traceback（最近一次调用last）：文件“C:/Users/B/PycharmProjects/BasicSelenium/test.py”，第17行，在
如果item.parent.parent.find（“p”）==color:AttributeError:'WebDriver'对象没有属性'parent'

这里有一种方法：

product = driver.find_elements_by_partial_link_text(keyword)
for item in product:
    if item.parent.parent.find("p") == wanted_color:
        item.get_attribute("href")

from selenium import webdriver

browser = webdriver.Chrome()
browser.get(url)
anchors = browser.find_elements_by_class_name('name-link')

products = []
for element in anchors:
    if element.find_element_by_xpath('..').tag_name == 'p':  # or 'h1'
        text = element.text
        products.append([element, text])

这为我们提供了一个交替标记列表，如下所示：

product = driver.find_elements_by_partial_link_text(keyword)
for item in product:
    if item.parent.parent.find("p") == wanted_color:
        item.get_attribute("href")

<h1><a class="name-link" href="/shop/blahblah">Very Cool Sweatshirt</a></h1>
<p><a class="name-link" href="/shop/blahblah">Red</a></p>

products = []
for element in anchors:
    if element.find_element_by_xpath('..').tag_name == 'p':  # or 'h1'
        text = element.text
        products.append([element, text])

或者，我们可以使用父标记名称筛选内容：

product = driver.find_elements_by_partial_link_text(keyword)
for item in product:
    if item.parent.parent.find("p") == wanted_color:
        item.get_attribute("href")

products = []
for element in anchors:
    if element.find_element_by_xpath('..').tag_name == 'p':  # or 'h1'
        text = element.text
        products.append([element, text])

对于类似的内容，我将编写一个函数，该函数接受关键字和颜色名称。您可以获取这些值并将它们插入到单个XPath中，然后单击返回的a标记

product = driver.find_elements_by_partial_link_text(keyword)
for item in product:
    if item.parent.parent.find("p") == wanted_color:
        item.get_attribute("href")

products = []
for element in anchors:
    if element.find_element_by_xpath('..').tag_name == 'p':  # or 'h1'
        text = element.text
        products.append([element, text])

def select_product(keyword, color)
    driver.find_element_by_xpath("//article//a[contains(., '" + keyword + "')]/../../p/a[contains(., '" + color + "')]").click()

你可以这样称呼它

product = driver.find_elements_by_partial_link_text(keyword)
for item in product:
    if item.parent.parent.find("p") == wanted_color:
        item.get_attribute("href")

products = []
for element in anchors:
    if element.find_element_by_xpath('..').tag_name == 'p':  # or 'h1'
        text = element.text
        products.append([element, text])

select_product("Geto Boys", "Ash Grey")

一些快速的XPath信息

product = driver.find_elements_by_partial_link_text(keyword)
for item in product:
    if item.parent.parent.find("p") == wanted_color:
        item.get_attribute("href")

products = []
for element in anchors:
    if element.find_element_by_xpath('..').tag_name == 'p':  # or 'h1'
        text = element.text
        products.append([element, text])

表示任何深度，而

表示子级（向下一级）

product = driver.find_elements_by_partial_link_text(keyword)
for item in product:
    if item.parent.parent.find("p") == wanted_color:
        item.get_attribute("href")

products = []
for element in anchors:
    if element.find_element_by_xpath('..').tag_name == 'p':  # or 'h1'
        text = element.text
        products.append([element, text])

a[contains（，“some text”）]

表示查找包含文本“some text”的

标记。

contains（）

中的

是

text（）

的快捷方式，它仅表示元素中包含的文本

product = driver.find_elements_by_partial_link_text(keyword)
for item in product:
    if item.parent.parent.find("p") == wanted_color:
        item.get_attribute("href")

products = []
for element in anchors:
    if element.find_element_by_xpath('..').tag_name == 'p':  # or 'h1'
        text = element.text
        products.append([element, text])

/…

表示上升一级

product = driver.find_elements_by_partial_link_text(keyword)
for item in product:
    if item.parent.parent.find("p") == wanted_color:
        item.get_attribute("href")

products = []
for element in anchors:
    if element.find_element_by_xpath('..').tag_name == 'p':  # or 'h1'
        text = element.text
        products.append([element, text])

因此，把这一切放在一起，它读作find a

ARTICLE

标记，它在任何级别上都有一个后代（任何级别）

标记，该标记包含

关键字

文本，该文本的父级（上两级）有一个

子级，该子级有一个

子级，该子级包含

颜色

文本

product = driver.find_elements_by_partial_link_text(keyword)
for item in product:
    if item.parent.parent.find("p") == wanted_color:
        item.get_attribute("href")

products = []
for element in anchors:
    if element.find_element_by_xpath('..').tag_name == 'p':  # or 'h1'
        text = element.text
        products.append([element, text])

XPath本身就是一种编程语言。您最好阅读XPath指南

product = driver.find_elements_by_partial_link_text(keyword)
for item in product:
    if item.parent.parent.find("p") == wanted_color:
        item.get_attribute("href")

products = []
for element in anchors:
    if element.find_element_by_xpath('..').tag_name == 'p':  # or 'h1'
        text = element.text
        products.append([element, text])

旁注。。。我建议您按以下顺序查找元素：

product = driver.find_elements_by_partial_link_text(keyword)
for item in product:
    if item.parent.parent.find("p") == wanted_color:
        item.get_attribute("href")

products = []
for element in anchors:
    if element.find_element_by_xpath('..').tag_name == 'p':  # or 'h1'
        text = element.text
        products.append([element, text])

凭身份证

通过CSS选择器

…然后，如果这两种方法都找不到，可以使用XPath通过包含的文本定位元素。XPath速度较慢，并且不如CSS选择器受支持。我在本例中使用它是因为您需要根据包含的文本查找元素，否则我将使用CSS选择器。

向我们展示您到目前为止获得的信息，以便我们知道从何处开始。请发布一些页面源代码或共享指向页面源代码的链接，而不是屏幕截图。如果id尝试添加图像，则一定没有添加，我会添加rn@bren添加它们！很好，你已经试过这样的东西了吗？向我们展示您迄今为止尝试过的代码，以及它是如何不按预期工作的。edited@bren提供了一个我尝试过的示例，我也尝试过其他东西，但记不清我到底做了什么。如果它工作得很好，请更深入地解释一下您是如何创建xpath的？很抱歉，这是python编码的新手，任何信息都有帮助！我更新了我的答案，解释了XPath和其他一些建议。

product = driver.find_elements_by_partial_link_text(keyword)
for item in product:
    if item.parent.parent.find("p") == wanted_color:
        item.get_attribute("href")

products = []
for element in anchors:
    if element.find_element_by_xpath('..').tag_name == 'p':  # or 'h1'
        text = element.text
        products.append([element, text])