Php XPath递归删除空DOM节点?
我试图找到一种方法,从HTML源中清除一堆空DOM元素,如下所示:Php XPath递归删除空DOM节点?,php,dom,xpath,Php,Dom,Xpath,我试图找到一种方法,从HTML源中清除一堆空DOM元素,如下所示: <div class="empty"> <div> </div> <div></div> </div> <a href="http://example.com">good</a> <div> <p></p> </div> <br> &
<div class="empty">
<div> </div>
<div></div>
</div>
<a href="http://example.com">good</a>
<div>
<p></p>
</div>
<br>
<img src="http://example.com/logo.png" />
<div></div>
<a href="http://example.com">good</a>
<br>
<img src="http://example.com/logo.png" />
$xpath = new DOMXPath($dom);
//$x = '//*[not(*) and not(normalize-space(.))]';
//$x = '//*[not(text() or node() or self::br)]';
//$x = 'not(normalize-space(.) or self::br)';
$x = '//*[not(text() or node() or self::br)]';
while(($nodeList = $xpath->query($x)) && $nodeList->length > 0) {
foreach ($nodeList as $node) {
$node->parentNode->removeChild($node);
}
}
有人能告诉我正确的XPath来删除空的DOM节点吗?如果是空的,这些节点就没有任何用途了?(img、br和input即使为空也有其作用)
电流输出:
<div>
<div> </div>
</div>
<a href="http://example.com">good</a>
<div>
</div>
<br>
更新
为了澄清,我正在寻找一个XPath查询:
- 递归匹配空节点,直到找到所有节点(包括空节点的父节点)
- 可以在每次清理后成功运行多次(如我的示例所示)
*[not(*) and not(text()[normalize-space()])]
与
=无子元素非(*)
=包含带有非空白文本的节点(不与此相反)text()[normalize-space()]
//*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*[self::img or self::input or self::br])
]
[not(ancestor::*
[count(.| //*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*
[self::img or self::input or self::br])
]
)
=
count(//*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*
[self::img or self::input or self::br])
]
)
]
)
]
XPath是XML文档的查询语言。因此,对XPath表达式的求值仅选择节点或从XML文档中提取非节点信息,而不会更改XML文档。因此,计算XPath表达式不会删除或插入节点——XML文档保持不变
您想要的是“从HTML源中清除一堆空DOM元素”,而不能仅使用XPath来完成
XPath上最可靠也是唯一官方(我们称之为规范性)的来源证实了这一点--:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match=
"*[not(string(translate(., ' ', '')))
and
not(descendant-or-self::*
[self::img or self::input or self::br])]"/>
</xsl:stylesheet>
<html>
<a href="http://example.com">good</a>
<br/>
<img src="http://example.com/logo.png"/>
</html>
//*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*[self::img or self::input or self::br])
]
[not(ancestor::*
[count(.| //*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*
[self::img or self::input or self::br])
]
)
=
count(//*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*
[self::img or self::input or self::br])
]
)
]
)
]
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match=
"//*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*[self::img or self::input or self::br])
]
[not(ancestor::*
[count(.| //*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*
[self::img or self::input or self::br])
]
)
=
count(//*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*
[self::img or self::input or self::br])
]
)
]
)
]
"/>
</xsl:stylesheet>
<html>
<a href="http://example.com">good</a>
<br/>
<img src="http://example.com/logo.png"/>
</html>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:variable name="vAllEmpty" select=
"//*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*
[self::img or self::input or self::br])
]"/>
<xsl:variable name="vTopEmpty" select=
"$vAllEmpty[not(ancestor::* intersect $vAllEmpty)]"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*[. intersect $vTopEmpty]"/>
</xsl:stylesheet>
//node()
[self::input or self::img or self::br
or
self::text()[normalize-space(translate(.,' ',''))]
]
/ancestor-or-self::node()
//node()
[not(count(.
|
//node()
[self::input or self::img or self::br
or
self::text()[normalize-space(translate(.,' ',''))]
]
/ancestor-or-self::node()
)
=
count(//node()
[self::input or self::img or self::br
or
self::text()[normalize-space(translate(.,' ',''))]
]
/ancestor-or-self::node()
)
)
]
“XPath的主要目的是处理XML[XML]的部分内容
为了支持这一主要目的,它还提供了基本的
用于处理字符串、数字和布尔值的工具
使用紧凑的非XML语法来促进在URI中使用XPath
XPath对抽象的、逻辑的
XML文档的结构,而不是其表面语法。XPath
从其在URL中使用的路径表示法获取其名称
浏览XML文档的层次结构。”
因此,为了实现require功能,必须结合XPath使用一些额外的语言
XSLT是一种专门为XML转换而设计的语言
这里是一个基于XSLT的示例——一个简短而简单的XSLT转换,用于执行所请求的清理工作:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match=
"*[not(string(translate(., ' ', '')))
and
not(descendant-or-self::*
[self::img or self::input or self::br])]"/>
</xsl:stylesheet>
<html>
<a href="http://example.com">good</a>
<br/>
<img src="http://example.com/logo.png"/>
</html>
//*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*[self::img or self::input or self::br])
]
[not(ancestor::*
[count(.| //*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*
[self::img or self::input or self::br])
]
)
=
count(//*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*
[self::img or self::input or self::br])
]
)
]
)
]
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match=
"//*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*[self::img or self::input or self::br])
]
[not(ancestor::*
[count(.| //*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*
[self::img or self::input or self::br])
]
)
=
count(//*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*
[self::img or self::input or self::br])
]
)
]
)
]
"/>
</xsl:stylesheet>
<html>
<a href="http://example.com">good</a>
<br/>
<img src="http://example.com/logo.png"/>
</html>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:variable name="vAllEmpty" select=
"//*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*
[self::img or self::input or self::br])
]"/>
<xsl:variable name="vTopEmpty" select=
"$vAllEmpty[not(ancestor::* intersect $vAllEmpty)]"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*[. intersect $vTopEmpty]"/>
</xsl:stylesheet>
//node()
[self::input or self::img or self::br
or
self::text()[normalize-space(translate(.,' ',''))]
]
/ancestor-or-self::node()
//node()
[not(count(.
|
//node()
[self::input or self::img or self::br
or
self::text()[normalize-space(translate(.,' ',''))]
]
/ancestor-or-self::node()
)
=
count(//node()
[self::input or self::img or self::br
or
self::text()[normalize-space(translate(.,' ',''))]
]
/ancestor-or-self::node()
)
)
]
基于XSLT的验证:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match=
"*[not(string(translate(., ' ', '')))
and
not(descendant-or-self::*
[self::img or self::input or self::br])]"/>
</xsl:stylesheet>
<html>
<a href="http://example.com">good</a>
<br/>
<img src="http://example.com/logo.png"/>
</html>
//*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*[self::img or self::input or self::br])
]
[not(ancestor::*
[count(.| //*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*
[self::img or self::input or self::br])
]
)
=
count(//*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*
[self::img or self::input or self::br])
]
)
]
)
]
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match=
"//*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*[self::img or self::input or self::br])
]
[not(ancestor::*
[count(.| //*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*
[self::img or self::input or self::br])
]
)
=
count(//*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*
[self::img or self::input or self::br])
]
)
]
)
]
"/>
</xsl:stylesheet>
<html>
<a href="http://example.com">good</a>
<br/>
<img src="http://example.com/logo.png"/>
</html>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:variable name="vAllEmpty" select=
"//*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*
[self::img or self::input or self::br])
]"/>
<xsl:variable name="vTopEmpty" select=
"$vAllEmpty[not(ancestor::* intersect $vAllEmpty)]"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*[. intersect $vTopEmpty]"/>
</xsl:stylesheet>
//node()
[self::input or self::img or self::br
or
self::text()[normalize-space(translate(.,' ',''))]
]
/ancestor-or-self::node()
//node()
[not(count(.
|
//node()
[self::input or self::img or self::br
or
self::text()[normalize-space(translate(.,' ',''))]
]
/ancestor-or-self::node()
)
=
count(//node()
[self::input or self::img or self::br
or
self::text()[normalize-space(translate(.,' ',''))]
]
/ancestor-or-self::node()
)
)
]
要删除所有这些节点,我们只需要从$vAllEmpty
让我们将所有此类“顶部节点”的集合表示为:$vtopenty
$vTopEmpty
可以使用以下XPath 2.0表达式从$vAllEmpty
表达:
//*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*
[self::img or self::input or self::br])
]
$vAllEmpty[not(ancestor::* intersect $vAllEmpty)]
$vAllEmpty[not(ancestor::*[count(.|$vAllEmpty) = count($vAllEmpty)])]
这将从$vAllEmpty
中选择那些没有任何祖先元素的节点,这些祖先元素也在$vAllEmpty
中
最后一个XPath表达式具有等效的XPath 1.0表达式:
//*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*
[self::img or self::input or self::br])
]
$vAllEmpty[not(ancestor::* intersect $vAllEmpty)]
$vAllEmpty[not(ancestor::*[count(.|$vAllEmpty) = count($vAllEmpty)])]
现在,我们用上面定义的扩展XPath表达式替换最后一个表达式$vAllEmpty
,这就是我们获得最终表达式的方式,该表达式仅选择“要删除的顶部节点”:
//*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*[self::img or self::input or self::br])
]
[not(ancestor::*
[count(.| //*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*
[self::img or self::input or self::br])
]
)
=
count(//*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*
[self::img or self::input or self::br])
]
)
]
)
]
使用变量进行基于XSLT-2.0的简短验证:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match=
"*[not(string(translate(., ' ', '')))
and
not(descendant-or-self::*
[self::img or self::input or self::br])]"/>
</xsl:stylesheet>
<html>
<a href="http://example.com">good</a>
<br/>
<img src="http://example.com/logo.png"/>
</html>
//*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*[self::img or self::input or self::br])
]
[not(ancestor::*
[count(.| //*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*
[self::img or self::input or self::br])
]
)
=
count(//*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*
[self::img or self::input or self::br])
]
)
]
)
]
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match=
"//*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*[self::img or self::input or self::br])
]
[not(ancestor::*
[count(.| //*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*
[self::img or self::input or self::br])
]
)
=
count(//*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*
[self::img or self::input or self::br])
]
)
]
)
]
"/>
</xsl:stylesheet>
<html>
<a href="http://example.com">good</a>
<br/>
<img src="http://example.com/logo.png"/>
</html>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:variable name="vAllEmpty" select=
"//*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*
[self::img or self::input or self::br])
]"/>
<xsl:variable name="vTopEmpty" select=
"$vAllEmpty[not(ancestor::* intersect $vAllEmpty)]"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*[. intersect $vTopEmpty]"/>
</xsl:stylesheet>
//node()
[self::input or self::img or self::br
or
self::text()[normalize-space(translate(.,' ',''))]
]
/ancestor-or-self::node()
//node()
[not(count(.
|
//node()
[self::input or self::img or self::br
or
self::text()[normalize-space(translate(.,' ',''))]
]
/ancestor-or-self::node()
)
=
count(//node()
[self::input or self::img or self::br
or
self::text()[normalize-space(translate(.,' ',''))]
]
/ancestor-or-self::node()
)
)
]
III.替代解决方案(可能需要“多次清理”):
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match=
"*[not(string(translate(., ' ', '')))
and
not(descendant-or-self::*
[self::img or self::input or self::br])]"/>
</xsl:stylesheet>
<html>
<a href="http://example.com">good</a>
<br/>
<img src="http://example.com/logo.png"/>
</html>
//*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*[self::img or self::input or self::br])
]
[not(ancestor::*
[count(.| //*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*
[self::img or self::input or self::br])
]
)
=
count(//*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*
[self::img or self::input or self::br])
]
)
]
)
]
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match=
"//*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*[self::img or self::input or self::br])
]
[not(ancestor::*
[count(.| //*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*
[self::img or self::input or self::br])
]
)
=
count(//*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*
[self::img or self::input or self::br])
]
)
]
)
]
"/>
</xsl:stylesheet>
<html>
<a href="http://example.com">good</a>
<br/>
<img src="http://example.com/logo.png"/>
</html>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:variable name="vAllEmpty" select=
"//*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*
[self::img or self::input or self::br])
]"/>
<xsl:variable name="vTopEmpty" select=
"$vAllEmpty[not(ancestor::* intersect $vAllEmpty)]"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*[. intersect $vTopEmpty]"/>
</xsl:stylesheet>
//node()
[self::input or self::img or self::br
or
self::text()[normalize-space(translate(.,' ',''))]
]
/ancestor-or-self::node()
//node()
[not(count(.
|
//node()
[self::input or self::img or self::br
or
self::text()[normalize-space(translate(.,' ',''))]
]
/ancestor-or-self::node()
)
=
count(//node()
[self::input or self::img or self::br
or
self::text()[normalize-space(translate(.,' ',''))]
]
/ancestor-or-self::node()
)
)
]
另一种方法不是尝试指定要删除的节点,而是指定要保留的节点——然后要删除的节点是所有节点和要保留的节点之间的设置差异
要保留的节点由此XPath表达式选择:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match=
"*[not(string(translate(., ' ', '')))
and
not(descendant-or-self::*
[self::img or self::input or self::br])]"/>
</xsl:stylesheet>
<html>
<a href="http://example.com">good</a>
<br/>
<img src="http://example.com/logo.png"/>
</html>
//*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*[self::img or self::input or self::br])
]
[not(ancestor::*
[count(.| //*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*
[self::img or self::input or self::br])
]
)
=
count(//*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*
[self::img or self::input or self::br])
]
)
]
)
]
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match=
"//*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*[self::img or self::input or self::br])
]
[not(ancestor::*
[count(.| //*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*
[self::img or self::input or self::br])
]
)
=
count(//*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*
[self::img or self::input or self::br])
]
)
]
)
]
"/>
</xsl:stylesheet>
<html>
<a href="http://example.com">good</a>
<br/>
<img src="http://example.com/logo.png"/>
</html>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:variable name="vAllEmpty" select=
"//*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*
[self::img or self::input or self::br])
]"/>
<xsl:variable name="vTopEmpty" select=
"$vAllEmpty[not(ancestor::* intersect $vAllEmpty)]"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*[. intersect $vTopEmpty]"/>
</xsl:stylesheet>
//node()
[self::input or self::img or self::br
or
self::text()[normalize-space(translate(.,' ',''))]
]
/ancestor-or-self::node()
//node()
[not(count(.
|
//node()
[self::input or self::img or self::br
or
self::text()[normalize-space(translate(.,' ',''))]
]
/ancestor-or-self::node()
)
=
count(//node()
[self::input or self::img or self::br
or
self::text()[normalize-space(translate(.,' ',''))]
]
/ancestor-or-self::node()
)
)
]
则要删除的节点为:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match=
"*[not(string(translate(., ' ', '')))
and
not(descendant-or-self::*
[self::img or self::input or self::br])]"/>
</xsl:stylesheet>
<html>
<a href="http://example.com">good</a>
<br/>
<img src="http://example.com/logo.png"/>
</html>
//*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*[self::img or self::input or self::br])
]
[not(ancestor::*
[count(.| //*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*
[self::img or self::input or self::br])
]
)
=
count(//*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*
[self::img or self::input or self::br])
]
)
]
)
]
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match=
"//*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*[self::img or self::input or self::br])
]
[not(ancestor::*
[count(.| //*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*
[self::img or self::input or self::br])
]
)
=
count(//*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*
[self::img or self::input or self::br])
]
)
]
)
]
"/>
</xsl:stylesheet>
<html>
<a href="http://example.com">good</a>
<br/>
<img src="http://example.com/logo.png"/>
</html>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:variable name="vAllEmpty" select=
"//*[not(normalize-space((translate(., ' ', ''))))
and
not(descendant-or-self::*
[self::img or self::input or self::br])
]"/>
<xsl:variable name="vTopEmpty" select=
"$vAllEmpty[not(ancestor::* intersect $vAllEmpty)]"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*[. intersect $vTopEmpty]"/>
</xsl:stylesheet>
//node()
[self::input or self::img or self::br
or
self::text()[normalize-space(translate(.,' ',''))]
]
/ancestor-or-self::node()
//node()
[not(count(.
|
//node()
[self::input or self::img or self::br
or
self::text()[normalize-space(translate(.,' ',''))]
]
/ancestor-or-self::node()
)
=
count(//node()
[self::input or self::img or self::br
or
self::text()[normalize-space(translate(.,' ',''))]
]
/ancestor-or-self::node()
)
)
]
但是,请注意这些都是要删除的节点,而不仅仅是“要删除的顶部节点”。可以只表示“要删除的顶部节点”,但得到的表达式相当复杂。如果试图删除所有要删除的节点,则会出现错误,因为“要删除的顶级节点”的子节点按文档顺序跟随它们。实现所需结果的最简单方法是在文本中使用正则表达式。注意:您必须多次使用这个表达式,因为它不是贪婪的,它只删除最低的空子节点,所以要删除所有空节点,我们必须多次调用正则表达式 以下是解决方案:
<?
$text = '<div class="empty">
<div> </div>
<div></div>
</div>
<a href="http://example.com">good</a>
<div>
<p></p>
</div>
<br>
<img src="http://example.com/logo.png" />
<div></div>';
// recursive function
function recreplace($text)
{
$restext = preg_replace("/<div(.*)?>((\s| )*|(\s| )*<p>(\s| )*<\/p>(\s| )*)*<\/div>/U", '', $text);
if ($text != $restext)
{
recreplace($restext);
}
else
{
return $restext;
}
}
print recreplace($text);
?>
那么您想要文本节点,
和
,以及它们的祖先
您可以使用//br
和//img
获取所有br和img
可以使用//text()
获取所有文本节点,使用//text()[normalize-space()]
获取所有非空文本节点。(尽管您可能需要类似于//text()[规范化空间(翻译(,'',))]
的东西来过滤
文本节点,如果您的xml解析器还没有这样做的话)
您可以获得所有具有祖先或自我::*
的父母
所以得到的表达式是
//br/ancestor-or-self::* | //img/ancestor-or-self::* | //text()[normalize-space()]/ancestor-or-self::*
XPath 2中的缩写为:
(//br | //img | //text()[normalize-space()])/ancestor-or-self::*
两点:1)由于要删除的一些顶级节点实际上不是空的(它们有子节点,其中一个节点中有一个非中断空间实体,这在技术上是实际内容),因此您必须反复运行查询,直到没有剩余的节点为止(您似乎已经意识到)如果要删除大量深度嵌套的层,则计算成本可能非常高。2) 删除空节点并不一定总是安全的。它很容易破坏依赖这些元素来实现正确间距和浮动的CSS规则。@DaveRandom,很好,我没有考虑CSS规则。然而,对于我的用例来说,这不是一个问题——额外的计算时间也不是问题。这些DOM结构不用于向用户显示。while循环似乎正在停止,但仍有DOM元素需要清除。它目前正在生成什么输出?哪个