使用python Beauty soup和请求包时HTML内容不正确_Python_Beautifulsoup

使用python Beauty soup和请求包时HTML内容不正确

python

使用python Beauty soup和请求包时HTML内容不正确,python,beautifulsoup,Python,Beautifulsoup,我在使用JSoup和BeautifulSoup解析网页后得到的HTML内容与下面所示的不同。有没有人有同样的问题，你能告诉我是怎么解决的吗检查每个块中的第三行- =======JSoup <div class="col-full"> Index Notifications <

我在使用JSoup和BeautifulSoup解析网页后得到的HTML内容与下面所示的不同。有没有人有同样的问题，你能告诉我是怎么解决的吗

检查每个块中的第三行-

=======JSoup

<div class="col-full">
 <p><strong>Index Notifications</strong></p>
 <p></p><br>
<p> <br /> <b> March 28, 2014</b>
<br >
<br >


索引通知



2014年3月28日

=========美丽的乌苏

<div class="col-full">
<p><strong>Index Notifications</strong></p>
<p><p> <br>
<b> March 28, 2014</b>
<br>
<br>


索引通知


2014年3月28日

解析损坏的HTML时，不同的解析器会尝试以不同的方式修复损坏的标记；对于如何处理此类错误，没有硬性规定

BeautifulSoup可以，而且每一个都会以不同的方式处理您的内容：

>>> import requests
>>> from bs4 import BeautifulSoup
>>> url = 'http://www.wisdomtree.com/etfs/index-notices.aspx'
>>> html = requests.get(url).content
>>> BeautifulSoup(html, 'html.parser').find('div', class_='col-full')
<div class="col-full">
<p><strong>Index Notifications</strong></p>
<p><p> <br>
<b> March 28, 2014</b>
<br> <br>
# ... cut ...
>>> BeautifulSoup(html, 'lxml').find('div', class_='col-full')
<div class="col-full">
<p><strong>Index Notifications</strong></p>
<p></p><p> <br/>
<b> March 28, 2014</b>
<br/> <br/>
# ... cut ...
>>> BeautifulSoup(html, 'html5lib').find('div', class_='col-full')
<div class="col-full">

            <p><strong>Index Notifications</strong></p>
            <p></p><p> <br/>
<b> March 28, 2014</b>
<br/>  <br/>
# ... cut ...

导入请求 >>>从bs4导入BeautifulSoup >>>url='1〕http://www.wisdomtree.com/etfs/index-notices.aspx' >>>html=requests.get（url.content） >>>美化组（html，'html.parser'）。查找（'div'，class='col-full'） 索引通知

2014年3月28日

# ... 切。。。 >>>beautifulsou（html，'lxml'）。find（'div'，class='col-full'） 索引通知

2014年3月28日

# ... 切。。。 >>>beautifulsou（html，'html5lib'）。find（'div'，class='col-full'） 索引通知

2014年3月28日

# ... 切。。。

html5lib

解析器是最慢的，但通常会像大多数浏览器一样解析损坏的HTML。

lxml

和

html5lib

都像JSoup一样解析了文档的这一特定部分。