Python 使用BeautifullSoup修改后保留html文件结构_Python_Beautifulsoup

Python 使用BeautifullSoup修改后保留html文件结构

python

Python 使用BeautifullSoup修改后保留html文件结构,python,beautifulsoup,Python,Beautifulsoup,我使用python和BeautifullSoup查找和替换html页面上的一些文本，我的问题是我需要保持文件结构（缩进、空格、新行等）不变，并且只更改所需的元素。我怎样才能做到这一点？str（soup）和soup.prettify（）都在以多种方式改变源文件 p.S.样本代码： soup = BeautifulSoup(text) for element in soup.findAll(text=True): if not element.parent.n

我使用python和BeautifullSoup查找和替换html页面上的一些文本，我的问题是我需要保持文件结构（缩进、空格、新行等）不变，并且只更改所需的元素。我怎样才能做到这一点？

str（soup）

和

soup.prettify（）

都在以多种方式改变源文件

p.S.样本代码：

soup = BeautifulSoup(text) for element in soup.findAll(text=True): if not element.parent.name in ['style', 'script', 'head', 'title','pre']: element.replaceWith(process(element)) result = str(soup) soup=BeautifulSoup（文本）对于soup.findAll中的元素（text=True）：如果不是['style'、'script'、'head'、'title'、'pre']中的element.parent.name：元素。替换为（过程（元素））结果=str（汤）

我想说没有简单的办法（或者根本没有办法）。从

BeautifulStoneSoup

的文档：

__str__(self, encoding='utf-8', prettyPrint=False, indentLevel=0)
    Returns a string or Unicode representation of this tag and
    its contents. To get Unicode, pass None for encoding.

    NOTE: since Python's HTML parser consumes whitespace, this
    method is not certain to reproduce the whitespace present in
    the original string.

根据注释，原来的空白会丢失到内部表示。

其他库可能会这样吗？