Python 如何使用XSLT将XML中的节点转换为CDATA？_Python_Xml_Xslt_Lxml_Cdata

Python 如何使用XSLT将XML中的节点转换为CDATA？

python xml xslt

Python 如何使用XSLT将XML中的节点转换为CDATA？,python,xml,xslt,lxml,cdata,Python,Xml,Xslt,Lxml,Cdata,我有一个source.xml文件，其结构如下： <products> <product> <id>1</id> <description> <style> table{ some css here } </style> <de

我有一个

source.xml

文件，其结构如下：

<products>
    <product>
        <id>1</id>
        <description>
            <style>
            table{
            some css here
            }
            </style>
            <descr>
            <div>name of producer like ABC&DEF</div>
            <table>
                <th>parameters</th>
                <tr><td>name of param 1 e.g POWER CONSUMPTION</td>
                    <td>value of param 1 with e.g < 100 W</td></tr>
            </table>
            </descr>
        </description>
    </product>
.....................
</products>

及

在SublimiteText3上，我总是得到相同的错误：

lxml.etree.XMLSyntaxError:StartTag:元素名称无效，{第一次出现非法字符的行数和列数}

我确信，在上面的链接中，这个解决方案就在我面前，但我看不到它。

或者我找不到它，因为我问不出正确的问题。请帮助，我对编码还不熟悉。

输入的XML格式不正确。我得先把它修好。这似乎就是它在你这边失败的原因

XML

<products>
    <product>
        <id>1</id>
        <description>
            <style>table{
            some css here
            }</style>
            <descr>
                <div>name of producer like ABC&amp;DEF</div>
                <table>
                    <th>parameters</th>
                    <tr>
                        <td>name of param 1 e.g POWER CONSUMPTION</td>
                        <td>value of param 1 with e.g &lt; 100 W</td>
                    </tr>
                </table>
            </descr>
        </description>
    </product>
</products>


1.
桌子{
这里有一些css
}
生产商名称，如ABC&；DEF
参数
参数1的名称，例如功耗
参数1的值，例如100 W

XSLT

<?xml version="1.0"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" indent="yes" omit-xml-declaration="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="description">
        <xsl:copy>
            <xsl:text disable-output-escaping="yes">&lt;![CDATA[</xsl:text>
            <xsl:copy-of select="*"/>
            <xsl:text disable-output-escaping="yes">]]&gt;</xsl:text>
        </xsl:copy>
    </xsl:template>
</xsl:stylesheet>


![CDATA[
]]

输出

<products>
  <product>
    <id>1</id>
    <description><![CDATA[
      <style>table{
            some css here
            }
      </style>
      <descr>
        <div>name of producer like ABC&amp;DEF</div>
        <table>
          <th>parameters</th>
          <tr>
            <td>name of param 1 e.g POWER CONSUMPTION</td>
            <td>value of param 1 with e.g &lt; 100 W</td>
          </tr>
        </table>
      </descr>]]>
    </description>
  </product>
</products>


1.
桌子{
这里有一些css
}
生产商名称，如ABC&；DEF
参数
参数1的名称，例如功耗
参数1的值，例如100 W
]]>

在我看来，一种干净的方法是使用serialize函数将所有需要的元素序列化为纯文本，然后在

cdata部分elements

的

xsl:output

声明中指定父容器，最后确保XSLT处理器负责序列化

现在XSLT3有了一个内置的XPath3.1

serialize

函数，在Python中，您可以将它与Saxon-C及其Python API一起使用

对于使用lxml的基于libxslt的XSLT 1，您可以使用暴露于XSLT的Python编写扩展函数：

from lxml import etree as ET

def serialize(context, nodes):
    return b''.join(ET.tostring(node) for node in nodes)


ns = ET.FunctionNamespace('http://example.com/mf')
ns['serialize'] = serialize

xml = ET.fromstring('<root><div><p>foo</p><p>bar</p></div></root>')

xsl = ET.fromstring('''<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:mf="http://example.com/mf" version="1.0">
  <xsl:output method="xml" cdata-section-elements="div" encoding="UTF-8"/>
  <xsl:template match="@* | node()">
    <xsl:copy>
       <xsl:apply-templates select="@* | node()"/>
    </xsl:copy>
  </xsl:template>
  <xsl:template match="div">
    <xsl:copy>
      <xsl:value-of select="mf:serialize(node())"/>
    </xsl:copy>
 </xsl:template>
</xsl:stylesheet>''')

transform = ET.XSLT(xsl)

result = transform(xml)

result.write_output("transformed.xml")

从lxml导入etree作为ET
def序列化（上下文、节点）：
返回b“”。加入（节点中节点的ET.tostring（节点）
ns=ET.FunctionNamespace（'http://example.com/mf')
ns['serialize']=序列化
xml=ET.fromstring（“foo
bar”）
xsl=ET.fromstring（“”）
''')
transform=ET.XSLT（xsl）
结果=转换（xml）
result.write_输出（“transformed.xml”）

然后输出是

<?xml version="1.0" encoding="UTF-8"?>
<root><div><![CDATA[<p>foo</p><p>bar</p>]]></div></root>


foo
bar]>

请编辑您的帖子并添加您尝试的XSLT。谢谢@YitzhakKhabinsky。我添加了我尝试过的

.xsl

样式表。谢谢。这是否意味着我不能有像“&”和“@JWPB”这样的字符，没有它们，输入XML的格式就不好。因此，XSLT.Understand无法处理。因此，我想要的输出内容张贴在问题是不可能获得的。这就解释了我失败的原因。转换问题信息表单开头显示的xml文件的正确方法是什么，您将其表示为已修复？我能够将我的xml文件如

和

转换为

td

和

元素中的所有内容都被转义了。但现在我不知道如何“非转义”html标记字符，同时在这些标记的文本中保留转义字符。谢谢@Martin Honnen。我想我需要几天时间来理解你写的东西。我试试看。
<?xml version="1.0"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" indent="yes" omit-xml-declaration="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="description">
        <xsl:copy>
            <xsl:text disable-output-escaping="yes">&lt;![CDATA[</xsl:text>
            <xsl:copy-of select="*"/>
            <xsl:text disable-output-escaping="yes">]]&gt;</xsl:text>
        </xsl:copy>
    </xsl:template>
</xsl:stylesheet>

<products>
  <product>
    <id>1</id>
    <description><![CDATA[
      <style>table{
            some css here
            }
      </style>
      <descr>
        <div>name of producer like ABC&amp;DEF</div>
        <table>
          <th>parameters</th>
          <tr>
            <td>name of param 1 e.g POWER CONSUMPTION</td>
            <td>value of param 1 with e.g &lt; 100 W</td>
          </tr>
        </table>
      </descr>]]>
    </description>
  </product>
</products>

from lxml import etree as ET

def serialize(context, nodes):
    return b''.join(ET.tostring(node) for node in nodes)


ns = ET.FunctionNamespace('http://example.com/mf')
ns['serialize'] = serialize

xml = ET.fromstring('<root><div><p>foo</p><p>bar</p></div></root>')

xsl = ET.fromstring('''<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:mf="http://example.com/mf" version="1.0">
  <xsl:output method="xml" cdata-section-elements="div" encoding="UTF-8"/>
  <xsl:template match="@* | node()">
    <xsl:copy>
       <xsl:apply-templates select="@* | node()"/>
    </xsl:copy>
  </xsl:template>
  <xsl:template match="div">
    <xsl:copy>
      <xsl:value-of select="mf:serialize(node())"/>
    </xsl:copy>
 </xsl:template>
</xsl:stylesheet>''')

transform = ET.XSLT(xsl)

result = transform(xml)

result.write_output("transformed.xml")

<?xml version="1.0" encoding="UTF-8"?>
<root><div><![CDATA[<p>foo</p><p>bar</p>]]></div></root>