Python Selenium，检查是否<；部门…>；在网页抓取代码中包含一个单词_Python_Html_Css_Selenium_Web Scraping

Python Selenium，检查是否<；部门…>；在网页抓取代码中包含一个单词

python html css selenium web-scraping

Python Selenium，检查是否<；部门…>；在网页抓取代码中包含一个单词,python,html,css,selenium,web-scraping,Python,Html,Css,Selenium,Web Scraping,我正在运行一个使用Selenium和BeautifulSoup的刮板，我想检查是否有某个单词出现 HTML代码片段如下所示： <div data-asin="0974158232" data-index="0" data-uuid="1f362f6b-dde2-4377-a5f3-518513486b7d" data-component-type="s-search-result" class="s-

我正在运行一个使用Selenium和BeautifulSoup的刮板，我想检查是否有某个单词出现

HTML代码片段如下所示：

<div data-asin="0974158232" data-index="0" data-uuid="1f362f6b-dde2-4377-a5f3-518513486b7d" data-component-type="s-search-result" class="s-result-item s-asin sg-col-0-of-12 sg-col-16-of-20 sg-col sg-col-12-of-16" data-component-id="14" data-cel-widget="search_result_0"><div class="sg-col-inner">
<div data-asin="" data-index="1" class="a-section a-spacing-none s-result-item s-flex-full-width s-border-bottom-none s-widget" data-cel-widget="search_result_1">
<div data-asin="" data-index="2" class="a-section a-spacing-none s-result-item s-flex-full-width s-border-bottom-none s-widget" data-cel-widget="search_result_2">

在这里，我想告诉代码查找

数据asin=”“

，并检查它是否为空字符串。在这种情况下，它不会为空，因为我们有：

要使用非空的数据asin=“…”
搜索，您可以使用以下示例：
import requests
from bs4 import BeautifulSoup


url = "https://www.amazon.com/s?k=A+Biblically+Based+Model+of+Cultural+Competence+in+the+Delivery+of+Healthcare+Services%3A+Seeing&ref=nb_sb_noss"
headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0",
    "Accept-Language": "en-US,en;q=0.5",
}

soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")

# search only data-asin that have value, print it and the title
for div in soup.find_all("div", {"data-asin": bool}):
    print(div["data-asin"], div.select_one(".a-text-normal").text)

印刷品：
0974158232医疗服务提供中基于圣经的文化能力模型：看
1433692163种植教会：你的指南，开始教会的繁殖
0310341728不完美：圣经中破碎的男人和女人以及我们能从他们身上学到什么
0800796853上帝的走私犯
1885904088杰出的妻子：圣经视角
B07K7YJPXD希望频道
B07F1DNGMS Alistair Begg-生命的真理
B07DHZ6DL9星际迷航超越（4K超高清）
B00100四弦琴的佐尼之心
您可以共享URL吗？你能使用美丽之声吗？@AndrejKesely当然可以，谢谢你的回复！该网址是我有进口美丽的汤是的。我从来没用过它，但如果能学到一些东西那就太好了，所以请继续。谢谢@AndrejKesely！我将很快测试它，但看起来很有希望。我能请你详细说明两件事吗？1） 我必须在大量的产品中运行代码，因此使用URL来运行代码似乎是不可能的。soup=BeautifulSoup（html）
会起作用吗？2） 可以解释这个{“data asin”：bool}
的作用吗？@econnob5 1。）您需要以某种方式加载页面，然后将HTML源“馈送”到beautifulsoup。您可以通过请求
或selenium
来实现。2.{“数据asin”：bool}
选择具有数据asin=
属性的所有元素，其中bool（）
的计算结果为True
。所以它会过滤掉空属性（“
）。谢谢，非常清楚@AndrejKesely。我正在努力解决的是，我没有URL列表。我在Amazon搜索框中输入了一个产品列表，但此时我不知道如何获取URL以将HTML源提供给beautifulsoup。你对怎么做有什么想法吗？看起来请求可以达到目的，但我不确定how@econnoob5您可以尝试更改URL中的k参数以搜索新产品。
import requests
from bs4 import BeautifulSoup


url = "https://www.amazon.com/s?k=A+Biblically+Based+Model+of+Cultural+Competence+in+the+Delivery+of+Healthcare+Services%3A+Seeing&ref=nb_sb_noss"
headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0",
    "Accept-Language": "en-US,en;q=0.5",
}

soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")

# search only data-asin that have value, print it and the title
for div in soup.find_all("div", {"data-asin": bool}):
    print(div["data-asin"], div.select_one(".a-text-normal").text)