（python）使用Beauty soup解析某些HTML输出标记_Python_Parsing_Beautifulsoup

（python）使用Beauty soup解析某些HTML输出标记

python parsing

（python）使用Beauty soup解析某些HTML输出标记,python,parsing,beautifulsoup,Python,Parsing,Beautifulsoup,您好：）在玩了一会儿之后，我想到了以下函数，它返回完整的HTML标记，而不是简单地返回倒置的“this”中的部分例如，今天的单词是“nosh”。而不是得到： [<h2 class="me">nosh</h2>] 有人知道我该怎么做吗？用lxml代替beautifulsou: >>> from lxml.html import parse >>> tree = parse("http://www.reference.com/wordo

您好：）在玩了一会儿之后，我想到了以下函数，它返回完整的HTML标记，而不是简单地返回倒置的“this”中的部分

例如，今天的单词是“nosh”。而不是得到：

[<h2 class="me">nosh</h2>]

有人知道我该怎么做吗？

用

lxml

代替beautifulsou:

>>> from lxml.html import parse
>>> tree = parse("http://www.reference.com/wordoftheday")
>>> tree.xpath("//h2")[0].text
'nosh'

使用

.text

属性获取内部文本，并改用

find（）

方法：

>>> from BeautifulSoup import BeautifulSoup
>>> from urllib2 import urlopen
>>> soup = BeautifulSoup(urlopen('http://www.reference.com/wordoftheday'))
>>> soup.find('h2').text
u'nosh'

使用lxml的原因很简单：我的Mac上似乎没有安装BeautifulSoup。

>>> from lxml.html import parse
>>> tree = parse("http://www.reference.com/wordoftheday")
>>> tree.xpath("//h2")[0].text
'nosh'

>>> from BeautifulSoup import BeautifulSoup
>>> from urllib2 import urlopen
>>> soup = BeautifulSoup(urlopen('http://www.reference.com/wordoftheday'))
>>> soup.find('h2').text
u'nosh'