Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/18.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 用漂亮的汤解析标签_Python_Python 3.x_Beautifulsoup - Fatal编程技术网

Python 用漂亮的汤解析标签

Python 用漂亮的汤解析标签,python,python-3.x,beautifulsoup,Python,Python 3.x,Beautifulsoup,我需要帮助在没有任何正则表达式的情况下解析下面的HTML标记。需要提取字符串'House NO/2012/' <p class="cold" style="clear:both">House NO /2012/</p> 2012号房屋/ 以下内容如何 我可以想出三种方法来完成它: 美化组(使用Css选择器) lxml(使用Xpath选择器) 正则表达式(使用模式) 代码如下: # pip install bs4 from bs4 import BeautifulSoup

我需要帮助在没有任何正则表达式的情况下解析下面的HTML标记。需要提取字符串
'House NO/2012/'

<p class="cold" style="clear:both">House NO /2012/</p>

2012号房屋/

以下内容如何


我可以想出三种方法来完成它:

  • 美化组(使用Css选择器)
  • lxml(使用Xpath选择器)
  • 正则表达式(使用模式)
  • 代码如下:

    # pip install bs4
    from bs4 import BeautifulSoup as bs
    
    html = '<p class="cold" style="clear:both">House NO /2012/</p>'
    html = bs(html, "html.parser")
    
    paragraph_text = html.find('p') # or you can use find('p', {'class':'cold'}) if you have more p tags
    print('BeautifulSoup:' , paragraph_text.text)
    
    
    # pip install lxml
    from lxml import etree
    
    html = '<p class="cold" style="clear:both">House NO /2012/</p>'
    source = etree.fromstring(html)
    paragraph_text = source.xpath('//p') # or you can use //p[@class="cold"]
    print('lxml:' , paragraph_text[0].text)
    
    
    import re
    
    html = '<p class="cold" style="clear:both">House NO /2012/</p>'
    match = re.search(r'>(.*)<' , html)
    print('Regular Expressions:' , match.group(1))
    

    adel如果下面的答案是帮助全部,请记住接受答案上的左复选标记
    House NO /2012/
    
    # pip install bs4
    from bs4 import BeautifulSoup as bs
    
    html = '<p class="cold" style="clear:both">House NO /2012/</p>'
    html = bs(html, "html.parser")
    
    paragraph_text = html.find('p') # or you can use find('p', {'class':'cold'}) if you have more p tags
    print('BeautifulSoup:' , paragraph_text.text)
    
    
    # pip install lxml
    from lxml import etree
    
    html = '<p class="cold" style="clear:both">House NO /2012/</p>'
    source = etree.fromstring(html)
    paragraph_text = source.xpath('//p') # or you can use //p[@class="cold"]
    print('lxml:' , paragraph_text[0].text)
    
    
    import re
    
    html = '<p class="cold" style="clear:both">House NO /2012/</p>'
    match = re.search(r'>(.*)<' , html)
    print('Regular Expressions:' , match.group(1))
    
    BeautifulSoup: House NO /2012/
    lxml: House NO /2012/
    Regular Expressions: House NO /2012/