Python 我如何保存<；br>；是否使用lxml.html text_content（）或等效文本作为换行符？_Python_Lxml_Lxml.html

Python 我如何保存<；br>；是否使用lxml.html text_content（）或等效文本作为换行符？

python

Python 我如何保存<；br>；是否使用lxml.html text_content（）或等效文本作为换行符？,python,lxml,lxml.html,Python,Lxml,Lxml.html,从lxml元素提取文本内容时，我希望将标记保留为\n 示例代码： fragment='这是一个文本节点。这是另一个文本节点。和一个子元素。另一个子元素，和两个文本节点' 输出： > h.text_content() 'This is a text node.This is another text node.And a child element.Another child, with two text nodes' 在每个元素的尾部添加一个\n字符可以得到您期望的结果： >>

从lxml元素提取文本内容时，我希望将

标记保留为

\n

示例代码：

fragment='这是一个文本节点。
这是另一个文本节点。

和一个子元素。另一个子元素，
和两个文本节点'

输出：

> h.text_content()
'This is a text node.This is another text node.And a child element.Another child, with two text nodes'

在每个

元素的尾部添加一个

\n

字符可以得到您期望的结果：

>>> import lxml.html as html
>>> fragment = '<div>This is a text node.<br/>This is another text node.<br/><br/><span>And a child element.</span><span>Another child,<br> with two text nodes</span></div>'
>>> doc = html.document_fromstring(fragment)
>>> for br in doc.xpath("*//br"):
        br.tail = "\n" + br.tail if br.tail else "\n"

>>> doc.text_content()
'This is a text node.\nThis is another text node.\n\nAnd a child element.Another child,\n with two text nodes'
>>> fragment
'<div>This is a text node.<br/>This is another text node.<br/><br/><span>And a child element.</span><span>Another child,<br> with two text nodes</span></div>'

>>将lxml.html导入为html
>>>fragment='这是一个文本节点。
这是另一个文本节点。

和一个子元素。另一个子元素，
和两个文本节点'
>>>doc=html.document\u fromstring（片段）
>>>对于doc.xpath中的br（“*//br”）：
br.tail=“\n”+br.tail if br.tail else”\n
>>>doc.text_content（）
'这是一个文本节点。\n这是另一个文本节点。\n\n和一个子元素。另一个子元素，\n有两个文本节点'
>>>碎片
'这是一个文本节点。
这是另一个文本节点。

和一个子元素。另一个子元素，
和两个文本节点'

解析后它是什么样子的？谢谢，我自己刚刚发现了这一点，尝试用我发布的示例html运行测试。

>>> import lxml.html as html
>>> fragment = '<div>This is a text node.<br/>This is another text node.<br/><br/><span>And a child element.</span><span>Another child,<br> with two text nodes</span></div>'
>>> doc = html.document_fromstring(fragment)
>>> for br in doc.xpath("*//br"):
        br.tail = "\n" + br.tail if br.tail else "\n"

>>> doc.text_content()
'This is a text node.\nThis is another text node.\n\nAnd a child element.Another child,\n with two text nodes'
>>> fragment
'<div>This is a text node.<br/>This is another text node.<br/><br/><span>And a child element.</span><span>Another child,<br> with two text nodes</span></div>'