使用XSL转换（XML到XML）包装列表元素的多个序列_Xml_Xslt_Transformation

使用XSL转换（XML到XML）包装列表元素的多个序列

xml xslt

使用XSL转换（XML到XML）包装列表元素的多个序列,xml,xslt,transformation,Xml,Xslt,Transformation,我有一些输入长（约3k行）的XML文档，通常如下所示： <chapter someAttributes="someValues"> <title>someTitle</title> multiple paragraphs ... <li> - some text </li

我有一些输入长（约3k行）的XML文档，通常如下所示：

<chapter someAttributes="someValues">
    <title>someTitle</title>

    <p>multiple paragraphs</p>
    <p>...</p>

    <li>
        <p>- some text</p>
    </li>
    <li>
        <p>- some other text</p>
    </li>
    <!-- another li elements -->

    <p>multiple other paragraphs</p>
    <p>...</p>

    <li>
        <p>1. some text</p>
    </li>
    <li>
        <p>2. some other text</p>
    </li>
    <!-- another li elements -->

    <p>multiple other paragraphs</p>
    <p>...</p>

    <!-- there are other elements such as table, illustration, ul etc. -->  
</chapter>

正如您进一步看到的，我需要从所有段落中剪切“标记字符”，即
-
或
1.
，
2.
，
3.
等
输入XML比我描述的更复杂（嵌套序列、表元素中的内部序列），但我正在寻找一些想法，特别是如何捕获和处理具有这种语义的特定序列

我希望输出的XML具有完全相同的顺序，只是使用包装的
li
元素。如果需要，可以使用XSLT 2.0/EXSLT。
以下是XSLT 2.0样式表：

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:output indent="yes"/> <xsl:template match="@* | node()"> <xsl:copy> <xsl:apply-templates select="@*, node()"/> </xsl:copy> </xsl:template> <xsl:template match="chapter"> <xsl:copy> <xsl:for-each-group select="*" group-adjacent="boolean(self::li)"> <xsl:choose> <xsl:when test="current-grouping-key() and ./p[1][starts-with(., '-')]"> <ul mark="DASH"> <xsl:apply-templates select="current-group()"/> </ul> </xsl:when> <xsl:when test="current-grouping-key() and ./p[1][matches(., '[0-9]\.')]"> <ol numeration="arabic"> <xsl:apply-templates select="current-group()"/> </ol> </xsl:when> <xsl:otherwise> <xsl:copy-of select="current-group()"/> </xsl:otherwise> </xsl:choose> </xsl:for-each-group> </xsl:copy> </xsl:template> <xsl:template match="li/p/text()[1]"> <xsl:value-of select="replace(., '^(-|[0-9]\.)', '')"/> </xsl:template> </xsl:stylesheet>

当我将Saxon 9.3与该样式表和示例输入一起使用时

<chapter someAttributes="someValues"> <title>someTitle</title> multiple paragraphs ... <li> - some text </li> <li> - some other text </li>  multiple other paragraphs ... <li> 1. some text </li> <li> 2. some other text </li>  multiple other paragraphs ...  </chapter>

某物名称多段 -一些文本 -其他一些文本多个其他段落一,。一些文本二,。其他一些文本多个其他段落
我得到以下输出：

<?xml version="1.0" encoding="UTF-8"?> <chapter> <title>someTitle</title> multiple paragraphs ... <ul mark="DASH"> <li> some text </li> <li> some other text </li> </ul> multiple other paragraphs ... <ol numeration="arabic"> <li> some text </li> <li> some other text </li> </ol> multiple other paragraphs ... </chapter>

某物名称多段一些文本其他一些文本多个其他段落一些文本其他一些文本多个其他段落
这是一个功能完整的解决方案，没有任何程序方法，如
xsl:for each group
和
xsl:if
XSLT2.0在Saxon-B9.0.0.1J下测试

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:output indent="yes" method="html"/> <xsl:strip-space elements="*"/>  <xsl:template match="node()|@*"> <xsl:copy> <xsl:apply-templates select="node()|@*"/> </xsl:copy> </xsl:template>  <xsl:template match="li[(name(preceding-sibling::*[position()=1]) != name(current())) and matches(.,'^-')]"> <ul mark="DASH"> <li><xsl:apply-templates/></li>  <xsl:apply-templates select="following-sibling::*[1][name() =name(current())]" mode="next"/> </ul> </xsl:template>  <xsl:template match="li[(name(preceding-sibling::*[position()=1]) != name(current())) and matches(.,'^[0-9]\.')]"> <ol numeration="ARABIC"> <li><xsl:apply-templates/></li> <xsl:apply-templates select="following-sibling::*[1][name() =name(current())]" mode="next"/> </ol> </xsl:template>  <xsl:template match="*" mode="next"> <li><xsl:apply-templates/></li> <xsl:apply-templates select="following-sibling::*[1][name() =name(current())]" mode="next"/> </xsl:template>  <xsl:template match="li/p/text()[1]"> <xsl:value-of select="replace(., '^(-|[0-9]\.)\s+', '')"/> </xsl:template> </xsl:stylesheet>

应用于您的输入会产生：

<chapter someAttributes="someValues"> <title>someTitle</title> multiple paragraphs ... <ul mark="DASH"> <li> some text </li> <li> some other text </li> </ul>  multiple other paragraphs ... <ol numeration="ARABIC"> <li> some text </li> <li> some other text </li> </ol>  multiple other paragraphs ...  </chapter>

某物名称多段一些文本其他一些文本多个其他段落一些文本其他一些文本多个其他段落
+1。我希望每个组都能得到相同的解决方案，而不使用
xsl:for
。为什么？这是一种个人挑战，比如不使用氧气攀登珠穆朗玛峰，还是有一些技术原因可以避免这个让问题很容易解决的构造？@迈克尔·凯：我不知道，但这似乎是你的观点Honnen：正如你所看到的，我希望，没有任何意图怀疑你的答案（一直被选为最佳答案）。斯佩特科夫斯基：我知道你已经接受了答案，也许你对此很满意。但是，提供一个功能完整的答案（无循环、条件）对我来说很有挑战性.我已经编辑了我的答案，所以现在我想知道。Cheers@empo：感谢您的努力，这两种方法对我来说都非常有效。xsl:for each group或xsl:choose没有任何远程过程性。我不认为您的解决方案在任何方面都优于使用xsl:for each group的解决方案：您认为它有哪些优点迈克尔·凯：我并不是说我的（谦逊的）解决方案是优越的。如果你有这样的意见，请原谅。现在我对这一点很感兴趣，因为我直接从你和其他人那里读到了这样的评论。例如，你看。如果我误解了事情，请向我澄清。
<chapter someAttributes="someValues"> <title>someTitle</title> multiple paragraphs ... <ul mark="DASH"> <li> some text </li> <li> some other text </li> </ul>  multiple other paragraphs ... <ol numeration="ARABIC"> <li> some text </li> <li> some other text </li> </ol>  multiple other paragraphs ...  </chapter>