Python 用漂亮的汤解析标签_Python_Python 3.x_Beautifulsoup

Python 用漂亮的汤解析标签

python python-3.x

Python 用漂亮的汤解析标签,python,python-3.x,beautifulsoup,Python,Python 3.x,Beautifulsoup,我需要帮助在没有任何正则表达式的情况下解析下面的HTML标记。需要提取字符串'House NO/2012/' <p class="cold" style="clear:both">House NO /2012/</p> 2012号房屋/ 以下内容如何我可以想出三种方法来完成它：美化组（使用Css选择器） lxml（使用Xpath选择器）正则表达式（使用模式）代码如下： # pip install bs4 from bs4 import BeautifulSoup

我需要帮助在没有任何正则表达式的情况下解析下面的HTML标记。需要提取字符串

'House NO/2012/'

<p class="cold" style="clear:both">House NO /2012/</p>

2012号房屋/

以下内容如何

我可以想出三种方法来完成它：

美化组（使用Css选择器）

lxml（使用Xpath选择器）

正则表达式（使用模式）

代码如下：

# pip install bs4
from bs4 import BeautifulSoup as bs

html = '<p class="cold" style="clear:both">House NO /2012/</p>'
html = bs(html, "html.parser")

paragraph_text = html.find('p') # or you can use find('p', {'class':'cold'}) if you have more p tags
print('BeautifulSoup:' , paragraph_text.text)


# pip install lxml
from lxml import etree

html = '<p class="cold" style="clear:both">House NO /2012/</p>'
source = etree.fromstring(html)
paragraph_text = source.xpath('//p') # or you can use //p[@class="cold"]
print('lxml:' , paragraph_text[0].text)


import re

html = '<p class="cold" style="clear:both">House NO /2012/</p>'
match = re.search(r'>(.*)<' , html)
print('Regular Expressions:' , match.group(1))

adel如果下面的答案是帮助全部，请记住接受答案上的左复选标记

House NO /2012/

# pip install bs4
from bs4 import BeautifulSoup as bs

html = '<p class="cold" style="clear:both">House NO /2012/</p>'
html = bs(html, "html.parser")

paragraph_text = html.find('p') # or you can use find('p', {'class':'cold'}) if you have more p tags
print('BeautifulSoup:' , paragraph_text.text)


# pip install lxml
from lxml import etree

html = '<p class="cold" style="clear:both">House NO /2012/</p>'
source = etree.fromstring(html)
paragraph_text = source.xpath('//p') # or you can use //p[@class="cold"]
print('lxml:' , paragraph_text[0].text)


import re

html = '<p class="cold" style="clear:both">House NO /2012/</p>'
match = re.search(r'>(.*)<' , html)
print('Regular Expressions:' , match.group(1))

BeautifulSoup: House NO /2012/
lxml: House NO /2012/
Regular Expressions: House NO /2012/