Python 如何使用Beauty soup解析htm文件_Python_Parsing_Beautifulsoup

Python 如何使用Beauty soup解析htm文件

python parsing

Python 如何使用Beauty soup解析htm文件,python,parsing,beautifulsoup,Python,Parsing,Beautifulsoup,我正在尝试使用BeautifulSoup解析htm本地文件 .htm是文件类型 from bs4 import BeautifulSoup with open('locfile.htm') as fp: soup = BeautifulSoup(fp, "html5lib") print(soup) 尝试三个不同的解析器，但得到相同的结果。 html5lib示例 <html><body><p>t a b l e i d = " T a b l a

我正在尝试使用BeautifulSoup解析htm本地文件

.htm是文件类型

from bs4 import BeautifulSoup
with open('locfile.htm') as fp:
   soup = BeautifulSoup(fp, "html5lib")
print(soup)

尝试三个不同的解析器，但得到相同的结果。 html5lib示例

<html><body><p>t a b l e   i d = " T a b l a D a t a "   c l a s s = " T a b l a    w i d t h = " 9 0 %  &gt; 
 t r &gt;....

.....

但是内部标记已经丢失，或者转换为html转义字符

如何维护标签？

最后我找到了解决方案

问题在于原始文件的编码：

with open('locfile.htm',encoding="utf-16LE") as fp:

我们无法帮助您解决代码问题，除非您发布您编写的代码。你让我们想象一下你写了什么，然后猜测为什么它不能像你期望的那样工作。如果在文本编辑器中查看，文件看起来正常吗？你能发布到该文件的链接吗？

with open('locfile.htm',encoding="utf-16LE") as fp: