Python BeautifulSoup：如何选择某个标签_Python_Beautifulsoup

Python BeautifulSoup：如何选择某个标签

python

Python BeautifulSoup：如何选择某个标签,python,beautifulsoup,Python,Beautifulsoup,当你想用螃蟹抓一个标签上的孩子时，我不知道汤是如何美味的。因此，我有以下HTML代码 <div class="media item avatar profile"> <a href="http://..." class="media-link action-medialink"> <img class="media-item-img" src="http://...jpeg" alt="name" title="name" width="150" height="

当你想用螃蟹抓一个标签上的孩子时，我不知道汤是如何美味的。因此，我有以下HTML代码

<div class="media item avatar profile">
<a href="http://..." class="media-link action-medialink">
<img class="media-item-img" src="http://...jpeg" alt="name" title="name" width="150" height="200">
</a>
</div>

这会打印整个img标签。如何仅选择src

谢谢。

src

是一个很好的例子。一旦有了标签，就可以像访问字典键一样访问属性；您只找到了

标记，因此您也需要导航到包含的

img

标记：

for x in soup.find_all('div', attrs={'class':'media item avatar profile'}):
    print x.a.img['src']

您的代码使用了返回标记对象的

findNext（）

；循环给你孩子，所以

是

img

对象。我把它改得更直接更清晰

现在是

div

，我们直接导航到第一个

，并包含

img

标记。

我想您可能需要如下内容：

soup.find('div', attrs={'class':'media item avatar profile'}).a.img['src']

[1]中的

：从bs4导入BeautifulSoup
在[2]：html=“”\
...: 
...: 
...: """
在[3]中：soup=BeautifulSoup（html）
在[4]中：soup.find（'div'，attrs={'class'：'media item avatar profile'}）.a.img['src']
Out[4]：'http://...jpeg'

findNext

返回与给定条件匹配的第一项，并显示在文档中给定标记之后。请注意，这意味着它返回的任何标记都不能保证是给定标记的子标记（例如

div

标记的子标记）

使用

findChildren

限制给定标记的子项：

import BeautifulSoup as bs

file_ = '''<html>
<div class="media item avatar profile">
<a href="http://..." class="media-link action-medialink">
<img class="media-item-img" src="http://...jpeg" alt="name" title="name" width="150" height="200">
</a>
</div>  
</html>
'''
soup = bs.BeautifulSoup(file_)
for x in soup.find(
        'div', attrs={'class':'media item avatar profile'}).findChildren('img'):
    print(x['src'])

谢谢你的快速回复。我已经尝试过了，但得到了以下错误：print x['src']TypeError:字符串索引必须是integers@evi：更新；你有一个

标签，而不是

img

标签。你不必在使用

find

not

findAll

时循环。我认为你的循环太多了：P

汤。find（'div'，attrs={'class'：'media item avatar profile'）

返回一个

str root:Ugh，我需要更多的咖啡因。我的shell使用了find_all（'a'）
still.：-P和find
返回一个元素（），因此循环将覆盖它的子元素，包括任何可导航的字符串项。
In [1]: from bs4 import BeautifulSoup

In [2]: html = """\
   ...: <div class="media item avatar profile">
   ...: <a href="http://..." class="media-link action-medialink">
   ...: <img class="media-item-img" src="http://...jpeg" alt="name" title="name" width="150" height="200">
   ...: </a>
   ...: </div>"""

In [3]: soup = BeautifulSoup(html)

In [4]: soup.find('div', attrs={'class':'media item avatar profile'}).a.img['src']
Out[4]: 'http://...jpeg'

import BeautifulSoup as bs

file_ = '''<html>
<div class="media item avatar profile">
<a href="http://..." class="media-link action-medialink">
<img class="media-item-img" src="http://...jpeg" alt="name" title="name" width="150" height="200">
</a>
</div>  
</html>
'''
soup = bs.BeautifulSoup(file_)
for x in soup.find(
        'div', attrs={'class':'media item avatar profile'}).findChildren('img'):
    print(x['src'])

http://...jpeg