使用Python从HTML获取div_Python_Html_Regex

使用Python从HTML获取div

python html regex

使用Python从HTML获取div,python,html,regex,Python,Html,Regex,我想从HTML页面中获取某个div中的值 <div class="well credit"> <div class="span2"> <h3><span> $ 5.402 </span></h3> </div> </div> $ 5.402 我已经用正则表达式（re.seach（）

我想从HTML页面中获取某个div中的值

    <div class="well credit">

      <div class="span2">
          <h3><span>
              $ 5.402 
          </span></h3>
      </div>

    </div>


$ 5.402

我已经用正则表达式（re.seach（））完成了这项工作，但是由于div是一个巨大的html，所以查找它需要花费很长时间

有没有一种方法可以在没有外部库的情况下更快地完成这项工作

谢谢

我会用的

要使用

标记获取所有内容，只需执行以下操作：

soup = BeautifulSoup(html)#make soup that is parse-able by bs
soup.findAll('div')

要获取span内的值，可以执行以下操作：

soup.find('span').get_text()

有很多不同的方法来获取你需要的信息

祝你好运，希望这有帮助

Scrapy也可能是一个解决方案。请阅读

输出：

<div class="span2">
    <h3><span>
        $ 5.402 
    </span></h3>
</div>


$ 5.402

Python在标准库中只有一个HTML解析器，而且它的级别非常低，因此如果您想使用HTML，就必须安装某种HTML解析库

是目前最快的：

import lxml.html

root = lxml.html.parse(handle)
price = root.xpath('//div[@class="well credit"]//span/@text')[0]

如果您想让它更快，请使用

root.iter

，一旦找到正确的元素，就停止解析HTML。

。考虑使用<代码> LXML<代码>。如果这是一个简单的建议，那么它是一个内置的，但是如果您确切地知道

div

将如何编码，为什么不直接使用

find

，可能是递归的呢？谢谢，它工作得很好：）。我使用credits=soup.find_all（“div”，{“class”：“well credit”}）更具体一点，html上有更多带有credit的div

import lxml.html

root = lxml.html.parse(handle)
price = root.xpath('//div[@class="well credit"]//span/@text')[0]