使用XSLT修改源文档的基本操作_Xslt

使用XSLT修改源文档的基本操作

xslt

使用XSLT修改源文档的基本操作,xslt,Xslt,我发现的所有XSLT处理教程和示例似乎都假设您的目标将是与源代码截然不同的格式/结构，并且您事先知道源代码的结构。我正在努力找出如何在不了解HTML文档现有结构的情况下对其执行简单的“就地”修改有人能给我举一个明确的例子吗，如果给定一个任意未知的HTML源： 1.) delete the classname 'foo' from all divs 2.) delete a node if its empty (ie <p></p>) 3.) delete a <p

我发现的所有XSLT处理教程和示例似乎都假设您的目标将是与源代码截然不同的格式/结构，并且您事先知道源代码的结构。我正在努力找出如何在不了解HTML文档现有结构的情况下对其执行简单的“就地”修改

有人能给我举一个明确的例子吗，如果给定一个任意未知的HTML源：

1.) delete the classname 'foo' from all divs
2.) delete a node if its empty (ie <p></p>)
3.) delete a <p> node if its first child is <br>
4.) add newattr="newvalue" to all H1
5.) replace 'heading' in text nodes with 'title'
6.) wrap all <u> tags in <b> tags (ie, <u>foo</u> -> <b><u>foo</u></b>)
7.) output the transformed document without changing anything else

1.）从所有div中删除类名'foo'
2.）如果节点为空，则删除该节点（即）
3.）如果一个节点的第一个子节点是

4.）将newattr=“newvalue”添加到所有H1
5.）将文本节点中的“标题”替换为“标题”
6.）将所有标签包装在标签中（即，foo->foo）
7.）在不更改任何其他内容的情况下输出转换后的文档

以上示例是我希望完成的主要转换类型。了解如何进行上述操作将大大有助于我构建更复杂的转换

为了帮助澄清/测试示例，这里有一个示例源和输出，但是我必须重申，我希望使用任意示例，而不必为每个源重写XSLT：

<!doctype html>
<html>
<body>
  <h1>heading</h1>
  <p></p>
  <p><br>line</p>
  <div class="foo bar"><u>baz</u></div>
  <p>untouched</p>
</body>
</html>


标题


行
巴兹
未触及

输出：

<!doctype html>
<html>
<body>
  <h1 newattr="newvalue">title</h1>
  <div class="bar"><b><u>baz</u></b></div>
  <p>untouched</p>
</body>
</html>


标题
巴兹
未触及

1.）从所有div中删除类名'foo'

<xsl:template match="div[contains(concat(' ', @class, ' '), ' foo ')]">
  <xsl:copy>
    <xsl:attribute name="class">
      <xsl:variable name="s" select="substring-before(concat(' ', @class, ' '), ' foo ')" />
      <xsl:variable name="e" select="substring-after(concat(' ', @class, ' '), ' foo ')" />
      <xsl:value-of select="normalize-space(concat($s, ' ', $e))" />
    </xsl:attribute>
    <xsl:apply-templates select="node() | @*[not(self::@class)]" />
  </xsl:copy>
</xsl:template>

5.）将文本节点中的“标题”替换为“标题”

<!-- This replaces the first occurrence of 'heading', case-sensitively.
     More generic search-and-replace templates are plenty, here on SO as well as 
     elsewhere on the 'net. -->
<xsl:template match="text()[contains(concat(' ', ., ' '), ' heading ')]">
  <xsl:variable name="s" select="substring-before(concat(' ', ., ' '), ' heading ')" />
  <xsl:variable name="e" select="substring-after(concat(' ', ., ' '), ' title ')" />
  <xsl:value-of select="normalize-space(concat($s, ' ', $e))" />
</xsl:template>

6.）将所有

标记包装在

标记中（即

foo

foo

）

7.）在不更改任何其他内容的情况下输出转换后的文档

<!-- the identity template copies everything that is not handled by 
     any of the more specific templates above -->
<xsl:template match="node() | @*">
  <xsl:copy>
    <xsl:apply-templates select="node() | @*" />
  </xsl:copy>
</xsl:template>

当多个模板可以匹配同一节点时，模板顺序和特殊性决定了哪个模板“获胜”

更具体的意思是：“在多个竞争模板中，匹配规则越复杂的模板获胜”

顺序意味着：“在多个具有相同特异性的竞争模板中，XSLT文档后面的模板获胜。

1。）从所有div中删除类名'foo'

<xsl:template match="div[contains(concat(' ', @class, ' '), ' foo ')]">
  <xsl:copy>
    <xsl:attribute name="class">
      <xsl:variable name="s" select="substring-before(concat(' ', @class, ' '), ' foo ')" />
      <xsl:variable name="e" select="substring-after(concat(' ', @class, ' '), ' foo ')" />
      <xsl:value-of select="normalize-space(concat($s, ' ', $e))" />
    </xsl:attribute>
    <xsl:apply-templates select="node() | @*[not(self::@class)]" />
  </xsl:copy>
</xsl:template>

5.）将文本节点中的“标题”替换为“标题”

<!-- This replaces the first occurrence of 'heading', case-sensitively.
     More generic search-and-replace templates are plenty, here on SO as well as 
     elsewhere on the 'net. -->
<xsl:template match="text()[contains(concat(' ', ., ' '), ' heading ')]">
  <xsl:variable name="s" select="substring-before(concat(' ', ., ' '), ' heading ')" />
  <xsl:variable name="e" select="substring-after(concat(' ', ., ' '), ' title ')" />
  <xsl:value-of select="normalize-space(concat($s, ' ', $e))" />
</xsl:template>

6.）将所有

标记包装在

标记中（即

foo

foo

）

7.）在不更改任何其他内容的情况下输出转换后的文档

<!-- the identity template copies everything that is not handled by 
     any of the more specific templates above -->
<xsl:template match="node() | @*">
  <xsl:copy>
    <xsl:apply-templates select="node() | @*" />
  </xsl:copy>
</xsl:template>

当多个模板可以匹配同一节点时，模板顺序和特殊性决定了哪个模板“获胜”

更具体的意思是：“在多个竞争模板中，匹配规则越复杂的模板获胜”

顺序意味着：“在多个具有相同特异性的竞争模板中，XSLT文档后面的模板获胜。

非常感谢，非常好的解释。我现在明白为什么XLST如此不受欢迎了。我甚至没有想到一个简单的字符串替换最多需要10行代码，或者即使是简单的操作也需要使Perl看起来清晰易懂的代码。@Tomalak，这个解决方案很好（+1），而且只有一些缺陷。至于那些无知的人，不要为他们担心——一个人无法改变他们，他们本身就是他们最大的问题。没有什么比勤劳的傻瓜更有害的了。幸运的是，他们不能理解的东西，他们不能触摸和破坏。@Dimitre:感谢你的赞扬。我认为你在剩下的评论中有点太苛刻了。XSLT对任何不习惯这个概念的人来说都是一个打击，当人们认为它不如他们熟悉的东西时，我可以理解。@SpliFF:Perl可能是比较XSLT最糟糕的选择之一。；）（就我个人而言，我发现Perl是一种丑陋、凌乱的纯写语言。它的表现力强、实用性强，但我拒绝使用它，因为它的丑陋让我望而却步。）Perl是围绕字符串处理而设计的，它确实是一种很好的语言。XSLT是围绕XML结构处理而设计的，它在这方面非常擅长。字符串处理不是设计目标之一，但在XSLT2.0中它要好得多。我确信XSLT 10行程序需要10倍于Perl代码的量。@SpliFF，XSLT处于最佳状态，请不要提及低估它的词语。非常感谢，非常好的解释。我现在明白为什么XLST如此不受欢迎了。我甚至没有想到一个简单的字符串替换最多需要10行代码，或者即使是简单的操作也需要使Perl看起来清晰易懂的代码。@Tomalak，这个解决方案很好（+1），而且只有一些缺陷。至于那些无知的人，不要为他们担心——一个人无法改变他们，他们本身就是他们最大的问题。没有什么比勤劳的傻瓜更有害的了。幸运的是，他们不能理解的东西，他们不能触摸和破坏。@Dimitre:感谢你的赞扬。我认为你在剩下的评论中有点太苛刻了。XSLT对任何不习惯这个概念的人来说都是一个打击，当人们认为它不如他们熟悉的东西时，我可以理解。@SpliFF:Perl可能是比较XSLT最糟糕的选择之一。；）（就我个人而言，我发现Perl是一种丑陋、凌乱的纯写语言。它的表现力强、实用性强，但我拒绝使用它，因为它的丑陋让我望而却步。）Perl是围绕字符串处理而设计的，它确实是一种很好的语言。XSLT是围绕XML结构处理而设计的，它在这方面非常擅长。字符串处理不是设计目标之一，但在XSLT2.0中它要好得多。我确信XSLT 10行程序需要10倍于Perl代码的量。@SpliFF，XSLT处于最佳状态，请不要提及低估它的词语。[+1代表Tomolak]

<!-- the identity template copies everything that is not handled by 
     any of the more specific templates above -->
<xsl:template match="node() | @*">
  <xsl:copy>
    <xsl:apply-templates select="node() | @*" />
  </xsl:copy>
</xsl:template>