Python 用于XML的转义字符串_Python_Xml_Security_Escaping

Python 用于XML的转义字符串

python xml security

Python 用于XML的转义字符串,python,xml,security,escaping,Python,Xml,Security,Escaping,我正在使用Python的xml.dom.minidom创建一个xml文档。（逻辑结构->XML字符串，而不是相反。）如何使它避开我提供的字符串，使它们不会弄乱XML？类似这样的内容 >>> from xml.sax.saxutils import escape >>> escape("< & >") '< & >' >从xml.sax.saxutils导入转义 >>>转义（） ‘及’ 你

我正在使用Python的

xml.dom.minidom

创建一个xml文档。（逻辑结构->XML字符串，而不是相反。）

如何使它避开我提供的字符串，使它们不会弄乱XML？

类似这样的内容

>>> from xml.sax.saxutils import escape
>>> escape("< & >")   
'&lt; &amp; &gt;'

>从xml.sax.saxutils导入转义
>>>转义（<&>）
‘及’

你是说你做了这样的事情：

from xml.dom.minidom import Text, Element

t = Text()
e = Element('p')

t.data = '<bar><a/><baz spam="eggs"> & blabla &entity;</>'
e.appendChild(t)

从xml.dom.minidom导入文本，元素
t=文本（）
e=元素（'p'）
t、 数据='&blabla&entity；'
e、 儿童（t）

然后您将得到很好的转义XML字符串：

>>> e.toxml()
'<p>&lt;bar&gt;&lt;a/&gt;&lt;baz spam=&quot;eggs&quot;&gt; &amp; blabla &amp;entity;&lt;/&gt;</p>'

>e.toxml（）
“bara/baz spam=“鸡蛋”&；布拉布拉；实体/ "

如果您不想再导入另一个项目，并且您已经有了

cgi

，您可以使用以下方法：

>>> import cgi
>>> cgi.escape("< & >")
'&lt; &amp; &gt;'

导入cgi >>>cgi.escape（“<&>”） ‘及’ 但是请注意，由于这段代码的易读性受到影响，您可能应该将其放在一个函数中，以便更好地描述您的意图：（并在编写时为其编写单元测试；）

def xml_转义：
返回cgi.escape#escapes“”和“&”

xml.sax.saxutils不转义引号字符（“）

这是另一个：

def escape( str ):
    str = str.replace("&", "&amp;")
    str = str.replace("<", "&lt;")
    str = str.replace(">", "&gt;")
    str = str.replace("\"", "&quot;")
    return str

def转义（str）：
str=str.replace（“&”和“&；”）
str=str.replace（“，”）
str=str.replace（“\”，“”）
返回str

如果您查找它，那么xml.sax.saxutils只替换字符串

xml.sax.saxutils.escape

只转义

，默认情况下，

，但它提供了一个

实体

参数来额外转义其他字符串：

from xml.sax.saxutils import escape

def xmlescape(data):
    return escape(data, entities={
        "'": "&apos;",
        "\"": "&quot;"
    })

xml.sax.saxutils.escape

在内部使用

str.replace（）

，因此您也可以跳过导入并编写自己的函数，如MichealMoser的回答所示。

xml\u special\u chars={
xml_special_chars = {
    "<": "&lt;",
    ">": "&gt;",
    "&": "&amp;",
    "'": "&apos;",
    '"': "&quot;",
}

xml_special_chars_re = re.compile("({})".format("|".join(xml_special_chars)))

def escape_xml_special_chars(unescaped):
    return xml_special_chars_re.sub(lambda match: xml_special_chars[match.group(0)], unescaped)

"": "",
“&”：“&；”，
“'”：“&apos；”，
'"': """,
}
xml_special_chars_re=re.compile（“{}）”.format（“|”。.join（xml_special_chars）））
def escape_xml_special_字符（未替换）：
返回xml_special_chars_re.sub（lambda match:xml_special_chars[match.group（0）]，未经scaped）

所有的魔法都发生在

re.sub（）中

：argument

repl

不仅接受字符串，还接受函数。

Andrey Vlasovskikh的公认答案是对OP的最完整的答案。但是这个主题是对

python escape xml

最频繁的搜索，我想提供所讨论的三种解决方案的时间比较在本文中，除了提供第四个选项外，我们还选择部署它，因为它提供了增强的性能

这四种解决方案都依赖于本机python数据处理或python标准库

选项1-正则表达式
此解决方案使用python正则表达式库。它产生的性能最低：

import re table = { "<": "<", ">": ">", "&": "&", "'": "'", '"': """, } pat = re.compile("({})".format("|".join(table))) def xmlesc(txt): return pat.sub(lambda match: table[match.group(0)], txt) >>> %timeit xmlesc('<&>"\'') 1.48 µs ± 1.73 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
选项3-str.replace
此解决方案使用string
replace（）
方法。在后台，它实现了与python的
xml.sax.saxutils
类似的逻辑。saxutils代码有一个for循环，这会降低一些性能，使此版本稍微快一点

def xmlesc(txt): txt = txt.replace("&", "&") txt = txt.replace("<", "<") txt = txt.replace(">", ">") txt = txt.replace('"', """) txt = txt.replace("'", "'") return txt >>> %timeit xmlesc('<&>"\'') 503 ns ± 0.725 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

任何XML DOM serialiser都会在字符数据输出时对其进行适当的转义…这就是DOM操作的目的，以防止您的手被标记弄脏。这正是我所寻找的。我的大部分XML处理都是使用lxml完成的，我想知道是否导入（现在）另一个XML模块可能会被污染？lxml中是否有等效模块？（似乎找不到）。这不处理引号转义。>>>从XML.sax.saxutils导入quoteattr>>>>quoteattr（'包含“双引号”和撇号的值'），'包含“双引号”和撇号的值“'这将导致现有转义字符格式错误。例如,成为&；amp&；回复：“这将导致现有转义字符格式错误”-这是错误的。现有的转义不会变为格式错误，而是双重转义。这是预期的且正确的行为：如果您的输入同时包含转义和未转义字符，那么它要么是无效输入，要么您希望转义字符一字不差地显示，如文本“在HTML中，&使用&；”编码，其中最后的“&”应以这种形式显示给用户。这里需要双转义。可能还需要转义单引号字符，即“最好避免使用关键字
str
作为变量名。您忘记了
str=str.replace（“”，“&apos；”
。
str=str.replace（“\”，”）
的另一种替代方法是
str=str.replace（“，”，“，”）
，我认为这更具可读性，因为反斜杠（\）看起来不太合适。如果你不从这里复制粘贴，你应该注意到第一个替换的是符号（&）。如果它不是第一个station，您将替换其他station的“与”…还值得注意的是，此API现在已被弃用。您将如何为文件执行此操作？例如，从xml.dom.minidom导入解析，parseString dom1=parse（'Test-bla.ddf'）（示例来自）
from xml.sax.saxutils import escape def xmlesc(txt): return escape(txt, entities={"'": "'", '"': """}) >>> %timeit xmlesc('<&>"\'') 832 ns ± 4.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

def xmlesc(txt): txt = txt.replace("&", "&") txt = txt.replace("<", "<") txt = txt.replace(">", ">") txt = txt.replace('"', """) txt = txt.replace("'", "'") return txt >>> %timeit xmlesc('<&>"\'') 503 ns ± 0.725 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

table = str.maketrans({ "<": "<", ">": ">", "&": "&", "'": "'", '"': """, }) def xmlesc(txt): return txt.translate(table) >>> %timeit xmlesc('<&>"\'') 352 ns ± 0.177 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)