Xml 使用XSLT函数删除所有html标记（允许的标记除外）_Xml_Xslt_Replace_Strip Tags

Xml 使用XSLT函数删除所有html标记（允许的标记除外）

xml xslt replace

Xml 使用XSLT函数删除所有html标记（允许的标记除外）,xml,xslt,replace,strip-tags,Xml,Xslt,Replace,Strip Tags,我正在尝试清理使用XSLT从rss提要中获取的一些数据。我想删除除p标记以外的所有标记 Cows are kool.The milk costs $1.99. 奶牛很酷。牛奶价格为1.99美元在这里，我对如何在1.0或2.0中使用XSLT来解决这个问题几乎没有疑问 1）我见过这个例子但我需要p标记存在，并且需要使用正则表达式。我们可以在匹配函数之前使用字符串，并以类似的方

我正在尝试清理使用XSLT从rss提要中获取的一些数据。我想删除除p标记以外的所有标记

 Cows are kool.<p>The <i>milk</i> <b>costs</b> $1.99.</p>

奶牛很酷。牛奶价格为1.99美元

在这里，我对如何在1.0或2.0中使用XSLT来解决这个问题几乎没有疑问

1）我见过这个例子

但我需要p标记存在，并且需要使用正则表达式。我们可以在匹配函数之前使用字符串，并以类似的方式执行。我认为xpath中不存在此函数

2）我知道replace函数不能用于此目的，因为它需要一个字符串，如果我们传递任何节点，它会提取内容，然后将其传递给函数，在这种情况下，会破坏删除标记的目的

我有点困惑，因为在这个回答中，使用了替换

3）我在nginx服务器中使用xslt完成这项工作

请在rss提要的body标签中找到下面的示例输入

<p>The Supreme Court issued on Friday a bailable warrant against sitting Calcutta high court justice CS Karnan, an unprecedented order in a bitter confrontation between the judge and the top court.</p><p>A seven-judge bench headed by Chief Justice of India JS Khehar issued the order directing Karnan’s presence on <h2>March 31</h2> because the judge ignored an earlier court order summoning him.<i>Justice Karnan</i> had to appear</p>

最高法院周五对加尔各答高等法院现任法官CS Karnan发出了可保释令，在法官与最高法院的激烈对峙中，这是一项前所未有的命令。
由印度首席大法官JS Khehar领导的七名法官法官于3月31日发布命令，指示Karnan出庭，因为法官无视早先传唤他的法院命令。Karnan法官必须出庭

更新：另外，我正在为这个

寻找一个xslt函数，假设您可以使用xslt 2.0，那么您可以将David Carlisle的HTML解析器（）应用于

主体

元素的内容，然后以剥离除

元素以外的每个元素的模式处理结果节点：

<?xml version="1.0" encoding="UTF-8"?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"
    xmlns:d="data:,dpc"
    xmlns:xhtml="http://www.w3.org/1999/xhtml"
    exclude-result-prefixes="d xhtml">

    <xsl:import href="htmlparse-by-dcarlisle.xsl"/>

    <xsl:template match="@*|node()" mode="#default strip">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()" mode="#current"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="body">
        <xsl:copy>
            <xsl:apply-templates select="d:htmlparse(., '', true())" mode="strip"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="*[not(self::p)]" mode="strip">
        <xsl:apply-templates/>
    </xsl:template>

</xsl:transform>

输入

<rss>
    <entry>
        <body><![CDATA[<p>The Supreme Court issued on Friday a bailable warrant against sitting Calcutta high court justice CS Karnan, an unprecedented order in a bitter confrontation between the judge and the top court.</p><p>A seven-judge bench headed by Chief Justice of India JS Khehar issued the order directing Karnan’s presence on <h2>March 31</h2> because the judge ignored an earlier court order summoning him.<i>Justice Karnan</i> had to appear</p>]]></body>
    </entry>
</rss>


最高法院周五对加尔各答高等法院的现任大法官CS Karnan发出了可保释令，在法官与最高法院的激烈对峙中，这是一项前所未有的命令。
由印度首席大法官JS Khehar领导的七名法官法官于3月31日发布命令，指示Karnan出庭，因为法官无视早先传唤他的法院命令。Karnan法官必须出庭。]>

那就

<rss>
    <entry>
        <body><p>The Supreme Court issued on Friday a bailable warrant against sitting Calcutta high court justice CS Karnan, an unprecedented order in a bitter confrontation between the judge and the top court.</p><p>A seven-judge bench headed by Chief Justice of India JS Khehar issued the order directing Karnan’s presence on March 31 because the judge ignored an earlier court order summoning him.Justice Karnan had to appear</p></body>
    </entry>
</rss>


最高法院周五对加尔各答高等法院的现任大法官CS Karnan发出了可保释令，在法官与最高法院的激烈对峙中，这是一项前所未有的命令。
由印度首席大法官JS Khehar领导的七名法官法官于3月31日发布命令，指示Karnan出庭，因为法官无视早先传唤他的法院命令。Karnan法官必须出庭

如果输入不是转义的，而是作为XML包含在输入中，则不需要对其进行解析，只需将模式应用于内容：

<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">

    <xsl:template match="@*|node()" mode="#default strip">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()" mode="#current"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="body">
        <xsl:copy>
            <xsl:apply-templates select="node()" mode="strip"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="*[not(self::p)]" mode="strip">
        <xsl:apply-templates/>
    </xsl:template>

</xsl:transform>

假设您可以使用XSLT 2.0，那么您可以将David Carlisle的HTML解析器（）应用于

body

元素的内容，然后以剥离除

元素之外的每个元素的模式处理结果节点：

<?xml version="1.0" encoding="UTF-8"?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"
    xmlns:d="data:,dpc"
    xmlns:xhtml="http://www.w3.org/1999/xhtml"
    exclude-result-prefixes="d xhtml">

    <xsl:import href="htmlparse-by-dcarlisle.xsl"/>

    <xsl:template match="@*|node()" mode="#default strip">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()" mode="#current"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="body">
        <xsl:copy>
            <xsl:apply-templates select="d:htmlparse(., '', true())" mode="strip"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="*[not(self::p)]" mode="strip">
        <xsl:apply-templates/>
    </xsl:template>

</xsl:transform>

输入

<rss>
    <entry>
        <body><![CDATA[<p>The Supreme Court issued on Friday a bailable warrant against sitting Calcutta high court justice CS Karnan, an unprecedented order in a bitter confrontation between the judge and the top court.</p><p>A seven-judge bench headed by Chief Justice of India JS Khehar issued the order directing Karnan’s presence on <h2>March 31</h2> because the judge ignored an earlier court order summoning him.<i>Justice Karnan</i> had to appear</p>]]></body>
    </entry>
</rss>


最高法院周五对加尔各答高等法院的现任大法官CS Karnan发出了可保释令，在法官与最高法院的激烈对峙中，这是一项前所未有的命令。
由印度首席大法官JS Khehar领导的七名法官法官于3月31日发布命令，指示Karnan出庭，因为法官无视早先传唤他的法院命令。Karnan法官必须出庭。]>

那就

<rss>
    <entry>
        <body><p>The Supreme Court issued on Friday a bailable warrant against sitting Calcutta high court justice CS Karnan, an unprecedented order in a bitter confrontation between the judge and the top court.</p><p>A seven-judge bench headed by Chief Justice of India JS Khehar issued the order directing Karnan’s presence on March 31 because the judge ignored an earlier court order summoning him.Justice Karnan had to appear</p></body>
    </entry>
</rss>


最高法院周五对加尔各答高等法院的现任大法官CS Karnan发出了可保释令，在法官与最高法院的激烈对峙中，这是一项前所未有的命令。
由印度首席大法官JS Khehar领导的七名法官法官于3月31日发布命令，指示Karnan出庭，因为法官无视早先传唤他的法院命令。Karnan法官必须出庭

如果输入不是转义的，而是作为XML包含在输入中，则不需要对其进行解析，只需将模式应用于内容：

<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">

    <xsl:template match="@*|node()" mode="#default strip">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()" mode="#current"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="body">
        <xsl:copy>
            <xsl:apply-templates select="node()" mode="strip"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="*[not(self::p)]" mode="strip">
        <xsl:apply-templates/>
    </xsl:template>

</xsl:transform>

请提供最少但完整的XML输入示例和所需的相应结果。我们需要查看RSS提要中的HTML是作为标记还是作为文本（在CDATA部分中）包含。我们还需要知道您是否希望HTML可以作为XML或仅作为HTML进行分析。@MartinHonnen更新了一个示例输入。我需要返回cdata中的内容，除了p标记外，没有任何HTML标记。请尽量减少但完整的XML输入示例以及您想要的相应结果。我们需要查看RSS提要中的HTML是作为标记还是作为文本（在CDATA部分中）包含。我们还需要知道您是否希望HTML可以作为XML或仅作为HTML进行分析。@MartinHonnen更新了一个示例输入。我需要返回cdata中的内容，除了p标记外，不带任何HTML标记。谢谢。假设HTML已被解析。只需要删除其他标记。我无法实现这一点。对我来说效果很好，您必须编辑您的问题，并提供最少但完整的XML、XSLT、您想要的输出示例，然后提供一个您得到的示例，或者如果您需要进一步的帮助，则提供一条准确的错误消息。是的，它很有效。只需看看如何使其适用于此输入。更新了问题。@Mortan。谢谢，我删除了模式条，它成功了。html解析是我所需要的一个很好的补充。另外，我们可以创建xsl吗