Python 美化组未正确分析脚本文本/模板_Python_Beautifulsoup

Python 美化组未正确分析脚本文本/模板

python

Python 美化组未正确分析脚本文本/模板,python,beautifulsoup,Python,Beautifulsoup,我有一个相当复杂的模板脚本，BeautifulSoup4由于某些原因无法理解。正如您在下面看到的，BS4在放弃之前只是部分解析到树中。为什么会这样？有没有办法解决 >>> from bs4 import BeautifulSoup >>> html = """<script id="scriptname" type="text/template"><section class="sectionname"><header>&l

我有一个相当复杂的模板脚本，BeautifulSoup4由于某些原因无法理解。正如您在下面看到的，BS4在放弃之前只是部分解析到树中。为什么会这样？有没有办法解决

>>> from bs4 import BeautifulSoup
>>> html = """<script id="scriptname" type="text/template"><section class="sectionname"><header><h1>Test</h1></header><table><tr><th>Title</th><td class="class"></td><th>Title</th><td class="class"></td></tr><tr><th>Title</th><td class="class"></td><th>Another row</th><td class="checksum"></td></tr></table></section></script> Other stuff I want to stay"""
>>> soup = BeautifulSoup(html)
>>> soup.findAll('script')
[<script id="scriptname" type="text/template"><section class="sectionname"><header><h1>Test</script>]

>>来自bs4导入组
>>>html=“”TestTitletTitlean其他行其他我想留下的东西”“”
>>>soup=BeautifulSoup（html）
>>>soup.findAll（'script'）
[测试]

编辑：在进一步的测试中，出于某种原因，BS3似乎能够正确地解析：

>>> from BeautifulSoup import BeautifulSoup as bs3
>>> soup = bs3(html)
>>> soup.script
<script id="scriptname" type="text/template"><section class="sectionname"><header><h1>Test</h1></header><table><tr><th>Title</th><td class="class"></td><th>Title</th><td class="class"></td></tr><tr><th>Title</th><td class="class"></td><th>Another row</th><td class="checksum"></td></tr></table></section></script>

>>从BeautifulSoup导入BeautifulSoup作为bs3
>>>soup=bs3（html）
>>>soup.script
TestTitleTitleTitleAnother行

Beauty Soup的默认解析器有时会失败。Beauty Soup支持Python标准库中包含的HTML解析器，但也支持许多第三方Python解析器

在某些情况下，我必须将解析器更改为其他解析器，如lxml、html5lib或任何其他解析器

这是上述解释的一个例子：

from bs4 import BeautifulSoup    

soup = BeautifulSoup(markup, "lxml")

我建议您阅读此

您使用的是哪个版本的BS？我使用的是版本4.3.2 Find_all和findAll是相同的。我得到了相同的结果，不管它没有找到我写的整个标记。它应该返回整个脚本标记，而不仅仅是其中的一部分。您确定这是所有信息吗？在我的本地机器上，这个代码工作得很完美！我会调查的；我必须安装一个依赖项才能使其工作。是的，您可以直接使用命令行中的pip安装lxml，或者从下载包。出于某种原因，我无法使lxml工作，但我通过html5lib解析器实现了这一点。给你一个正确的答案：）谢谢。