Python 如何使用Beautiful Soup提取div的内容（图像）_Python_Web Scraping_Beautifulsoup

Python 如何使用Beautiful Soup提取div的内容（图像）

python web-scraping

Python 如何使用Beautiful Soup提取div的内容（图像）,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,用蟒蛇和靓汤我找不到这个部门 links = soup.find_all('div', attrs={'class': 'product_image clearfix'}) 之后我必须提取图像对于当前版本的BS，这应该可以： links = soup.find_all('div', class_='product_image clearfix') 您使用的是什么版本的BeautifulSoup。您应该能够通过以下方式打印div的内容： from bs4 import BeautifulSo

用蟒蛇和靓汤

我找不到这个部门

links = soup.find_all('div', attrs={'class': 'product_image clearfix'})

之后我必须提取图像

对于当前版本的BS，这应该可以：

links = soup.find_all('div', class_='product_image clearfix')

您使用的是什么版本的BeautifulSoup。您应该能够通过以下方式打印div的内容：

from bs4 import BeautifulSoup

html = """<div class="product_image clearfix">
  <img src="https://res.sastasundar.com/incom/images/product/thumb/XPLOR-Dark-Chocolate-Brownie-1542880911-10051353-1.jpg" title="XPLOR Dark Chocolate Brownie 50 gm" class=" center-block">
</div>"""

soup = BeautifulSoup(html, 'html.parser')

for div in soup.find_all('div', class_='product_image clearfix'):
  for img in div.find_all('img', recursive=False):
    print(img)

从bs4导入美化组
html=”“”
"""
soup=BeautifulSoup（html，'html.parser'）
对于汤中的div.find_all（'div'，class='product'u image clearfix'）：
对于div.find_all（'img'，recursive=False）中的img：
打印（img）

对于我从表格中收集的内容，以下是一种工作方式：

您可以通过以下方式获得所需的标签：

tags = soup.find_all('div', "product_image clearfix")

其中，第二个参数默认为HTML元素的类名。然后，您可以通过将子项放入带有

.contents

的列表中，或使用

.children

对它们进行迭代，来查看标记子项。在本例中，为了简单起见，我将使用children，并使用第一个找到并匹配的标记从以下位置提取图像源：

import bs4

soup = bs4.BeautifulSoup("<div class=\"product_image clearfix\"> <img src=\"https://res.sastasundar.com/incom/images/product/thumb/XPLOR-Dark-Chocolate-Brownie-1542880911-10051353-1.jpg\" title=\"XPLOR Dark Chocolate Brownie 50 gm\" class=\" center-block\"></div>")

tags = soup.find_all('div', "product_image clearfix")

img_src = None

for t in tags[0].children:
    if type(t) == bs4.element.Tag:
        img_src = t['src']

print(img_src)

导入bs4
汤=bs4.BeautifulSoup（“”）
tags=soup.find_all（'div'，“product_image clearfix”）
img_src=无
对于标记[0]中的t。子项：
如果类型（t）==bs4.element.Tag：
img_src=t['src']
打印（img\U src）

类型检查非常重要，因为

标记[0]中可能存在bs4.element.NavigableString
对象。如果存在换行符或空格，则取决于HTML解析器。
动态加载整个集合。您可以提出与页面相同的请求
import requests

base = 'https://res.sastasundar.com/incom/images/product/'
r = requests.get('https://www.retailershakti.com/category/loadBrandListData?MfgGroup=&categoryId=1357&size=50&page=1').json()
images = [base + i['idata'][0]['ProductImage'] for i in r]
print(images)

不工作，没有显示错误，但没有打印结果如果只搜索一个类，会发生什么情况，例如：links=soup.find_all（'div'，class='product\u image'）
？相同。没有结果。编撰精品！这样地。但是当从一个实时网页解析时。不工作<代码>https://www.retailershakti.com/category/brand_listing/chocolates-and-brownies/1357

试试这个，我不明白。使用F5Great刷新页面时，可以在网络选项卡（F12）中找到url！找到了。你能告诉我为什么JSON中的[0]会被拆分，以及for循环是如何在。。。这是什么语法？0是因为i['data']是一个列表，在列表理解过程中，您需要该列表中的第一项。“循环”实际上是一个