Python 用漂亮的汤解析标签
我需要帮助在没有任何正则表达式的情况下解析下面的HTML标记。需要提取字符串Python 用漂亮的汤解析标签,python,python-3.x,beautifulsoup,Python,Python 3.x,Beautifulsoup,我需要帮助在没有任何正则表达式的情况下解析下面的HTML标记。需要提取字符串'House NO/2012/' <p class="cold" style="clear:both">House NO /2012/</p> 2012号房屋/ 以下内容如何 我可以想出三种方法来完成它: 美化组(使用Css选择器) lxml(使用Xpath选择器) 正则表达式(使用模式) 代码如下: # pip install bs4 from bs4 import BeautifulSoup
'House NO/2012/'
<p class="cold" style="clear:both">House NO /2012/</p>
2012号房屋/
以下内容如何
我可以想出三种方法来完成它:
# pip install bs4
from bs4 import BeautifulSoup as bs
html = '<p class="cold" style="clear:both">House NO /2012/</p>'
html = bs(html, "html.parser")
paragraph_text = html.find('p') # or you can use find('p', {'class':'cold'}) if you have more p tags
print('BeautifulSoup:' , paragraph_text.text)
# pip install lxml
from lxml import etree
html = '<p class="cold" style="clear:both">House NO /2012/</p>'
source = etree.fromstring(html)
paragraph_text = source.xpath('//p') # or you can use //p[@class="cold"]
print('lxml:' , paragraph_text[0].text)
import re
html = '<p class="cold" style="clear:both">House NO /2012/</p>'
match = re.search(r'>(.*)<' , html)
print('Regular Expressions:' , match.group(1))
adel如果下面的答案是帮助全部,请记住接受答案上的左复选标记
House NO /2012/
# pip install bs4
from bs4 import BeautifulSoup as bs
html = '<p class="cold" style="clear:both">House NO /2012/</p>'
html = bs(html, "html.parser")
paragraph_text = html.find('p') # or you can use find('p', {'class':'cold'}) if you have more p tags
print('BeautifulSoup:' , paragraph_text.text)
# pip install lxml
from lxml import etree
html = '<p class="cold" style="clear:both">House NO /2012/</p>'
source = etree.fromstring(html)
paragraph_text = source.xpath('//p') # or you can use //p[@class="cold"]
print('lxml:' , paragraph_text[0].text)
import re
html = '<p class="cold" style="clear:both">House NO /2012/</p>'
match = re.search(r'>(.*)<' , html)
print('Regular Expressions:' , match.group(1))
BeautifulSoup: House NO /2012/
lxml: House NO /2012/
Regular Expressions: House NO /2012/