Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/333.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python BeautifulSoup不解析html的每个标记_Python_Parsing_Beautifulsoup_Html5lib - Fatal编程技术网

Python BeautifulSoup不解析html的每个标记

Python BeautifulSoup不解析html的每个标记,python,parsing,beautifulsoup,html5lib,Python,Parsing,Beautifulsoup,Html5lib,我对BeautifulSoup没有完全解析收到的html有问题。我尝试使用lxml和html5lib解析器,但遇到了同样的问题 html = '<td style="vertical-align: top">1</td> <td style="vertical-align: top"><span class="ui-icon country flg-fr"></span>\t</td><td class="pn">

我对BeautifulSoup没有完全解析收到的html有问题。我尝试使用lxml和html5lib解析器,但遇到了同样的问题

html = '<td style="vertical-align: top">1</td> <td style="vertical-align: top"><span class="ui-icon country flg-fr"></span>\t</td><td class="pn"><a class="player-link" href="/Players/25604">Hugo Lloris <span class="incident-wrapper"></span> </a><span class="player-meta-data">29</span><span class="player-meta-data">,  GK  </span></td>   <td class="ShotsTotal ">0\t</td><td class="ShotOnTarget ">0\t</td><td class="KeyPassTotal ">0\t</td><td class="PassSuccessInMatch ">88\t</td><td class="DuelAerialWon ">0\t</td><td class="Touches ">35\t</td><td class="rating ">6.24</td> <td style="text-align: left"><span class="incident-wrapper"></span></td> '

parsed_html = ipdb> BeautifulSoup(html, 'html5lib')
<html><head></head><body>1 <span class="ui-icon country flg-fr"></span> <a class="player-link" href="/Players/25604">Hugo Lloris <span class="incident-wrapper"></span> </a><span class="player-meta-data">29</span><span class="player-meta-data">,  GK  </span>   0   0   0   88  0   35  6.24 <span class="incident-wrapper"></span> </body></html>
html='1\t29,GK 0\t0\t0\t88\t0\t35\t6.24'
解析的html=ipdb>BeautifulSoup(html,'html5lib')
129GK0880356.24

它对我有用。我执行以下代码(使用
beautifulsoup4==4.4.1
):

从bs4导入美化组
html=”“”
1.
\t
29,GK
0\t
0\t
0\t
88\t
0\t
35\t
6.24
"""
已解析的_html=BeautifulSoup(html,'html5lib')
打印(html)
我已经打印了以下html:

<td style="vertical-align: top">1</td>
<td style="vertical-align: top"><span class="ui-icon country flg-fr"></span>    </td>
<td class="pn"><a class="player-link" href="/Players/25604">Hugo Lloris <span class="incident-wrapper"></span> </a><span
        class="player-meta-data">29</span><span class="player-meta-data">,  GK  </span></td>
<td class="ShotsTotal ">0   </td>
<td class="ShotOnTarget ">0 </td>
<td class="KeyPassTotal ">0 </td>
<td class="PassSuccessInMatch ">88  </td>
<td class="DuelAerialWon ">0    </td>
<td class="Touches ">35 </td>
<td class="rating ">6.24</td>
<td style="text-align: left"><span class="incident-wrapper"></span></td>
1
29,GK
0
0
0
88
0
35
6.24

没有发现任何遗漏。

为什么要使用
ipdb
?删除它并使用
lxml
应该可以正常工作。
<td style="vertical-align: top">1</td>
<td style="vertical-align: top"><span class="ui-icon country flg-fr"></span>    </td>
<td class="pn"><a class="player-link" href="/Players/25604">Hugo Lloris <span class="incident-wrapper"></span> </a><span
        class="player-meta-data">29</span><span class="player-meta-data">,  GK  </span></td>
<td class="ShotsTotal ">0   </td>
<td class="ShotOnTarget ">0 </td>
<td class="KeyPassTotal ">0 </td>
<td class="PassSuccessInMatch ">88  </td>
<td class="DuelAerialWon ">0    </td>
<td class="Touches ">35 </td>
<td class="rating ">6.24</td>
<td style="text-align: left"><span class="incident-wrapper"></span></td>