Xml 删除编码的html并添加换行符_Xml_Xslt_Xslt 2.0_Xslt 3.0

Xml 删除编码的html并添加换行符

xml xslt

Xml 删除编码的html并添加换行符,xml,xslt,xslt-2.0,xslt-3.0,Xml,Xslt,Xslt 2.0,Xslt 3.0,我已经试着解决这个问题好几个小时了，但是没有成功。XML看起来像- <description> Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor labore et dolore magna aliquyam erat <p><b>Section B: China&am

我已经试着解决这个问题好几个小时了，但是没有成功。XML看起来像-

    <description>
     Lorem ipsum dolor sit amet, consetetur sadipscing elitr,
    sed diam nonumy eirmod tempor labore et dolore magna aliquyam erat

     &lt;p&gt;&lt;b&gt;Section B: China&lt;/b&gt;&lt;/p&gt;

     &lt;p&gt;Lorem ipsum dolor sit amet, consetetur sadipscing elitr,
     sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam
     eratLorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy
     eirmod tempor invidunt ut labore et dolore magna aliquyam erat&lt;/p&gt;

      &lt;p&gt;&lt;b&gt;Section C: Himalayan Studies&lt;/b&gt;&lt;/p&gt;

     &lt;p&gt;Lorem ipsum dolor sit amet, consetetur sadipscing elitr,
     sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam
     eratLorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam
     nonumy eirmod tempor invidunt ut labore a aliquyam erat&lt;/p&gt;

     </description>

我尝试过使用replace函数，但无法添加换行符。也尝试使用翻译，但没有运气

<xsl:value-of select="translate(.,
            translate(.,
            'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ',
            ''),
            '')"/>

任何关于如何解决此问题的帮助都将不胜感激。

一个不雅观（但有效）的解决方案：

<xsl:value-of select="replace(replace(replace(., 
                     '&lt;p&gt;&lt;b&gt;', '¶'), 
                     '(&lt;)(.*)(&gt;)', ''), 
                     '¶', '&lt;br/&gt;')" 
              disable-output-escaping="yes"/>

一种不雅观（但有效）的解决方案：

<xsl:value-of select="replace(replace(replace(., 
                     '&lt;p&gt;&lt;b&gt;', '¶'), 
                     '(&lt;)(.*)(&gt;)', ''), 
                     '¶', '&lt;br/&gt;')" 
              disable-output-escaping="yes"/>

另一种选择（更加丑陋）

使用函数的XSLT 3.0解决方案：

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0">
    <!--standard identity template-->
    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="description">
        <xsl:copy>
            <!--Concatenate encoded <p> element to ensure that it is well-formed 
                XML with a document element when parsed.
                Use parse-xml() to parse the encoded markup as a parsed document.
                Apply-templates to the parsed document--> 
            <xsl:apply-templates select="parse-xml(concat('&lt;p&gt;', ., '&lt;/p&gt;'))"/>
        </xsl:copy>
    </xsl:template>

    <!-- remove <p> and <b> elements -->
    <xsl:template match="p | b">
        <xsl:apply-templates/>
    </xsl:template>

    <!--for every <p> element that has a <b> element, generate a <br/> -->
    <xsl:template match="p[b]">
        <br/>
        <xsl:apply-templates/>
    </xsl:template>
</xsl:stylesheet>

使用函数的XSLT 3.0解决方案：

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0">
    <!--standard identity template-->
    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="description">
        <xsl:copy>
            <!--Concatenate encoded <p> element to ensure that it is well-formed 
                XML with a document element when parsed.
                Use parse-xml() to parse the encoded markup as a parsed document.
                Apply-templates to the parsed document--> 
            <xsl:apply-templates select="parse-xml(concat('&lt;p&gt;', ., '&lt;/p&gt;'))"/>
        </xsl:copy>
    </xsl:template>

    <!-- remove <p> and <b> elements -->
    <xsl:template match="p | b">
        <xsl:apply-templates/>
    </xsl:template>

    <!--for every <p> element that has a <b> element, generate a <br/> -->
    <xsl:template match="p[b]">
        <br/>
        <xsl:apply-templates/>
    </xsl:template>
</xsl:stylesheet>

一种XSLT2.0解决方案，它使用函数在出现

pb

的地方分割编码的HTML。对于每个标记化项，它创建

元素（如果它不是序列中的第一个项），并使用函数从该项中删除任何剩余的编码标记

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="description">
        <xsl:copy>
            <xsl:for-each select="tokenize(., '&lt;p&gt;&lt;b&gt;')">
                <xsl:if test="position()>1">
                    <br/>
                </xsl:if>
                <xsl:sequence select="replace(., '&lt;.*?&gt;', '')"/>
            </xsl:for-each>
        </xsl:copy>
    </xsl:template>

</xsl:stylesheet>

一种XSLT2.0解决方案，它使用函数在出现

pb

的地方分割编码的HTML。对于每个标记化项，它创建

元素（如果它不是序列中的第一个项），并使用函数从该项中删除任何剩余的编码标记

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="description">
        <xsl:copy>
            <xsl:for-each select="tokenize(., '&lt;p&gt;&lt;b&gt;')">
                <xsl:if test="position()>1">
                    <br/>
                </xsl:if>
                <xsl:sequence select="replace(., '&lt;.*?&gt;', '')"/>
            </xsl:for-each>
        </xsl:copy>
    </xsl:template>

</xsl:stylesheet>

它输出OP想要的内容。我认为在匹配唯一现有元素时，使用标识模板没有任何区别。在任何情况下，您的结果在章节标题后都有标记剩余，例如，

…China/b/p

和以下段落周围：

pLorem ipsum。。。aliquyam erat./p

。这与OP的请求不匹配：“我希望输出是干净的，没有编码的

或

标记”。恰好我的编辑器中的输入面板和输出面板具有相同的XML。发生了什么事？这是一种有效的方法吗？它输出OP想要的内容。我认为当您匹配唯一现有的元素时，使用标识模板没有任何区别。在任何情况下，您的结果在章节标题后都有标记剩余，例如，

…China/b/p

和以下段落周围：

pLorem ipsum。。。aliquyam erat./p

。这与OP的请求不匹配：“我希望输出是干净的，没有编码的

或

标记”。恰好我的编辑器中的输入面板和输出面板具有相同的XML。我怎么了？这是一个有效的方法吗？