Python HTML解析中的标记_Python_Html_Image_Beautifulsoup

Python HTML解析中的标记

python html image

Python HTML解析中的标记,python,html,image,beautifulsoup,Python,Html,Image,Beautifulsoup,我正在使用BeautifulSoup编写一个解析脚本，在该脚本中，我搜索页面中的所有img标记，只想抓取具有一定宽度的图片下面是标记的一个示例： <img alt="" src="//upload.wikimedia.org/wikipedia/en/thumb/a/a9/Example.jpg/111px-Example.jpg" width="111" height="120" /> 这似乎不起作用。似乎缺少width属性不会产生NoneType。因此，如

我正在使用BeautifulSoup编写一个解析脚本，在该脚本中，我搜索页面中的所有

img

标记，只想抓取具有一定宽度的图片

下面是标记的一个示例：

<img 
  alt="" 
  src="//upload.wikimedia.org/wikipedia/en/thumb/a/a9/Example.jpg/111px-Example.jpg"
  width="111"
  height="120"
/>

这似乎不起作用。似乎缺少

width

属性不会产生

NoneType

。因此，如果不是

None

，那么什么？

BeautifulSoup提供了一种调用方法来处理此问题：

[img for img in soup.findAll("img") if "width" in img.attrs]

soup.findAll("img", width=True)

从上面的链接：

这些特殊的价值观是真实的，没有一个是特别有趣的。True匹配给定属性具有任何值的标记，而None匹配给定属性没有值的标记。一些例子：

soup.findAll(align=True)
# [<p id="firstpara" align="center">This is paragraph <b>one</b>.</p>,
#  <p id="secondpara" align="blah">This is paragraph <b>two</b>.</p>]

[tag.name for tag in soup.findAll(align=None)]
# [u'html', u'head', u'title', u'body', u'b', u'b']

soup.findAll（align=True）
#这是第一段，
#这是第二段。]
[soup.findAll中标记的tag.name（align=None）]
#[u'html'，u'head'，u'title'，u'body'，u'b'，u'b']

BeautifulSoup提供了一种调用方法来处理此问题：

soup.findAll("img", width=True)

从上面的链接：

这些特殊的价值观是真实的，没有一个是特别有趣的。True匹配给定属性具有任何值的标记，而None匹配给定属性没有值的标记。一些例子：

soup.findAll(align=True)
# [<p id="firstpara" align="center">This is paragraph <b>one</b>.</p>,
#  <p id="secondpara" align="blah">This is paragraph <b>two</b>.</p>]

[tag.name for tag in soup.findAll(align=None)]
# [u'html', u'head', u'title', u'body', u'b', u'b']

soup.findAll（align=True）
#这是第一段，
#这是第二段。]
[soup.findAll中标记的tag.name（align=None）]
#[u'html'，u'head'，u'title'，u'body'，u'b'，u'b']

一些附加上下文可能会有帮助。一些附加上下文可能会有帮助。如果您将来遇到类似的问题，而这些问题无法通过列表理解解决，请尝试

img.get（'width'）

。如果没有这样的属性，它将像您预期的那样返回None。如果您将来遇到类似的问题，而该问题无法通过列表理解解决，请尝试

img.get（'width'）

。如果没有这样的属性，它将像您期望的那样返回None。