Php XPath递归删除空DOM节点?

Php XPath递归删除空DOM节点?,php,dom,xpath,Php,Dom,Xpath,我试图找到一种方法,从HTML源中清除一堆空DOM元素,如下所示: <div class="empty"> <div>&nbsp;</div> <div></div> </div> <a href="http://example.com">good</a> <div> <p></p> </div> <br> &

我试图找到一种方法,从HTML源中清除一堆空DOM元素,如下所示:

<div class="empty">
    <div>&nbsp;</div>
    <div></div>
</div>
<a href="http://example.com">good</a>
<div>
    <p></p>
</div>
<br>
<img src="http://example.com/logo.png" />
<div></div>
<a href="http://example.com">good</a>
<br>
<img src="http://example.com/logo.png" />
$xpath = new DOMXPath($dom);

//$x = '//*[not(*) and not(normalize-space(.))]';
//$x = '//*[not(text() or node() or self::br)]';
//$x = 'not(normalize-space(.) or self::br)';
$x = '//*[not(text() or node() or self::br)]';

while(($nodeList = $xpath->query($x)) && $nodeList->length > 0) {
    foreach ($nodeList as $node) {
        $node->parentNode->removeChild($node);
    }
}
有人能告诉我正确的XPath来删除空的DOM节点吗?如果是空的,这些节点就没有任何用途了?(img、br和input即使为空也有其作用)

电流输出:

<div>
    <div>&nbsp;</div>

</div>
<a href="http://example.com">good</a>
<div>

</div>
<br>


更新 为了澄清,我正在寻找一个XPath查询:

  • 递归匹配空节点,直到找到所有节点(包括空节点的父节点)
  • 可以在每次清理后成功运行多次(如我的示例所示)

您是否尝试过类似的XPath

*[not(*) and not(text()[normalize-space()])]

  • 非(*)
    =无子元素
  • text()[normalize-space()]
    =包含带有非空白文本的节点(不与此相反)

I.初始解决方案:

//*[not(normalize-space((translate(., '&#xA0;', ''))))
  and
    not(descendant-or-self::*[self::img or self::input or self::br])
    ]
     [not(ancestor::*
             [count(.| //*[not(normalize-space((translate(., '&#xA0;', ''))))
                         and
                           not(descendant-or-self::*
                                  [self::img or self::input or self::br])
                          ]
                    )
             =
              count(//*[not(normalize-space((translate(., '&#xA0;', ''))))
                      and
                        not(descendant-or-self::*
                                 [self::img or self::input or self::br])
                        ]
                   )
              ]
          )
     ]
XPath是XML文档的查询语言。因此,对XPath表达式的求值仅选择节点或从XML文档中提取非节点信息,而不会更改XML文档。因此,计算XPath表达式不会删除或插入节点——XML文档保持不变

您想要的是“从HTML源中清除一堆空DOM元素”,而不能仅使用XPath来完成

XPath上最可靠也是唯一官方(我们称之为规范性)的来源证实了这一点--

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match=
 "*[not(string(translate(., '&#xA0;', '')))
  and
    not(descendant-or-self::*
          [self::img or self::input or self::br])]"/>
</xsl:stylesheet>
<html>
   <a href="http://example.com">good</a>
   <br/>
   <img src="http://example.com/logo.png"/>
</html>
//*[not(normalize-space((translate(., '&#xA0;', ''))))
  and
    not(descendant-or-self::*[self::img or self::input or self::br])
    ]
     [not(ancestor::*
             [count(.| //*[not(normalize-space((translate(., '&#xA0;', ''))))
                         and
                           not(descendant-or-self::*
                                  [self::img or self::input or self::br])
                          ]
                    )
             =
              count(//*[not(normalize-space((translate(., '&#xA0;', ''))))
                      and
                        not(descendant-or-self::*
                                 [self::img or self::input or self::br])
                        ]
                   )
              ]
          )
     ]
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match=
   "//*[not(normalize-space((translate(., '&#xA0;', ''))))
      and
        not(descendant-or-self::*[self::img or self::input or self::br])
       ]
        [not(ancestor::*
               [count(.| //*[not(normalize-space((translate(., '&#xA0;', ''))))
                           and
                             not(descendant-or-self::*
                                    [self::img or self::input or self::br])
                             ]
                      )
               =
                count(//*[not(normalize-space((translate(., '&#xA0;', ''))))
                        and
                          not(descendant-or-self::*
                                 [self::img or self::input or self::br])
                          ]
                      )
               ]
            )
        ]
 "/>
</xsl:stylesheet>
<html>
   <a href="http://example.com">good</a>
   <br/>
   <img src="http://example.com/logo.png"/>
</html>
<xsl:stylesheet version="2.0"
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     <xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
     <xsl:strip-space elements="*"/>

     <xsl:variable name="vAllEmpty" select=
      "//*[not(normalize-space((translate(., '&#xA0;', ''))))
         and
           not(descendant-or-self::*
                 [self::img or self::input or self::br])

          ]"/>

     <xsl:variable name="vTopEmpty" select=
     "$vAllEmpty[not(ancestor::* intersect $vAllEmpty)]"/>

     <xsl:template match="node()|@*">
      <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
      </xsl:copy>
     </xsl:template>

     <xsl:template match="*[. intersect $vTopEmpty]"/>
</xsl:stylesheet>
  //node()
    [self::input or self::img or self::br
    or
     self::text()[normalize-space(translate(.,'&#xA0;',''))]
    ]
     /ancestor-or-self::node()
  //node()
     [not(count(.
              |
                //node() 
                   [self::input or self::img or self::br
                  or
                    self::text()[normalize-space(translate(.,'&#xA0;',''))]
                   ]
                    /ancestor-or-self::node()
                )
        =
         count(//node()
                  [self::input or self::img or self::br
                 or
                   self::text()[normalize-space(translate(.,'&#xA0;',''))]
                  ]
                   /ancestor-or-self::node()
               )
         )
     ]
“XPath的主要目的是处理XML[XML]的部分内容 为了支持这一主要目的,它还提供了基本的 用于处理字符串、数字和布尔值的工具 使用紧凑的非XML语法来促进在URI中使用XPath XPath对抽象的、逻辑的 XML文档的结构,而不是其表面语法。XPath 从其在URL中使用的路径表示法获取其名称 浏览XML文档的层次结构。”

因此,为了实现require功能,必须结合XPath使用一些额外的语言

XSLT是一种专门为XML转换而设计的语言

这里是一个基于XSLT的示例——一个简短而简单的XSLT转换,用于执行所请求的清理工作

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match=
 "*[not(string(translate(., '&#xA0;', '')))
  and
    not(descendant-or-self::*
          [self::img or self::input or self::br])]"/>
</xsl:stylesheet>
<html>
   <a href="http://example.com">good</a>
   <br/>
   <img src="http://example.com/logo.png"/>
</html>
//*[not(normalize-space((translate(., '&#xA0;', ''))))
  and
    not(descendant-or-self::*[self::img or self::input or self::br])
    ]
     [not(ancestor::*
             [count(.| //*[not(normalize-space((translate(., '&#xA0;', ''))))
                         and
                           not(descendant-or-self::*
                                  [self::img or self::input or self::br])
                          ]
                    )
             =
              count(//*[not(normalize-space((translate(., '&#xA0;', ''))))
                      and
                        not(descendant-or-self::*
                                 [self::img or self::input or self::br])
                        ]
                   )
              ]
          )
     ]
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match=
   "//*[not(normalize-space((translate(., '&#xA0;', ''))))
      and
        not(descendant-or-self::*[self::img or self::input or self::br])
       ]
        [not(ancestor::*
               [count(.| //*[not(normalize-space((translate(., '&#xA0;', ''))))
                           and
                             not(descendant-or-self::*
                                    [self::img or self::input or self::br])
                             ]
                      )
               =
                count(//*[not(normalize-space((translate(., '&#xA0;', ''))))
                        and
                          not(descendant-or-self::*
                                 [self::img or self::input or self::br])
                          ]
                      )
               ]
            )
        ]
 "/>
</xsl:stylesheet>
<html>
   <a href="http://example.com">good</a>
   <br/>
   <img src="http://example.com/logo.png"/>
</html>
<xsl:stylesheet version="2.0"
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     <xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
     <xsl:strip-space elements="*"/>

     <xsl:variable name="vAllEmpty" select=
      "//*[not(normalize-space((translate(., '&#xA0;', ''))))
         and
           not(descendant-or-self::*
                 [self::img or self::input or self::br])

          ]"/>

     <xsl:variable name="vTopEmpty" select=
     "$vAllEmpty[not(ancestor::* intersect $vAllEmpty)]"/>

     <xsl:template match="node()|@*">
      <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
      </xsl:copy>
     </xsl:template>

     <xsl:template match="*[. intersect $vTopEmpty]"/>
</xsl:stylesheet>
  //node()
    [self::input or self::img or self::br
    or
     self::text()[normalize-space(translate(.,'&#xA0;',''))]
    ]
     /ancestor-or-self::node()
  //node()
     [not(count(.
              |
                //node() 
                   [self::input or self::img or self::br
                  or
                    self::text()[normalize-space(translate(.,'&#xA0;',''))]
                   ]
                    /ancestor-or-self::node()
                )
        =
         count(//node()
                  [self::input or self::img or self::br
                 or
                   self::text()[normalize-space(translate(.,'&#xA0;',''))]
                  ]
                   /ancestor-or-self::node()
               )
         )
     ]
基于XSLT的验证

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match=
 "*[not(string(translate(., '&#xA0;', '')))
  and
    not(descendant-or-self::*
          [self::img or self::input or self::br])]"/>
</xsl:stylesheet>
<html>
   <a href="http://example.com">good</a>
   <br/>
   <img src="http://example.com/logo.png"/>
</html>
//*[not(normalize-space((translate(., '&#xA0;', ''))))
  and
    not(descendant-or-self::*[self::img or self::input or self::br])
    ]
     [not(ancestor::*
             [count(.| //*[not(normalize-space((translate(., '&#xA0;', ''))))
                         and
                           not(descendant-or-self::*
                                  [self::img or self::input or self::br])
                          ]
                    )
             =
              count(//*[not(normalize-space((translate(., '&#xA0;', ''))))
                      and
                        not(descendant-or-self::*
                                 [self::img or self::input or self::br])
                        ]
                   )
              ]
          )
     ]
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match=
   "//*[not(normalize-space((translate(., '&#xA0;', ''))))
      and
        not(descendant-or-self::*[self::img or self::input or self::br])
       ]
        [not(ancestor::*
               [count(.| //*[not(normalize-space((translate(., '&#xA0;', ''))))
                           and
                             not(descendant-or-self::*
                                    [self::img or self::input or self::br])
                             ]
                      )
               =
                count(//*[not(normalize-space((translate(., '&#xA0;', ''))))
                        and
                          not(descendant-or-self::*
                                 [self::img or self::input or self::br])
                          ]
                      )
               ]
            )
        ]
 "/>
</xsl:stylesheet>
<html>
   <a href="http://example.com">good</a>
   <br/>
   <img src="http://example.com/logo.png"/>
</html>
<xsl:stylesheet version="2.0"
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     <xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
     <xsl:strip-space elements="*"/>

     <xsl:variable name="vAllEmpty" select=
      "//*[not(normalize-space((translate(., '&#xA0;', ''))))
         and
           not(descendant-or-self::*
                 [self::img or self::input or self::br])

          ]"/>

     <xsl:variable name="vTopEmpty" select=
     "$vAllEmpty[not(ancestor::* intersect $vAllEmpty)]"/>

     <xsl:template match="node()|@*">
      <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
      </xsl:copy>
     </xsl:template>

     <xsl:template match="*[. intersect $vTopEmpty]"/>
</xsl:stylesheet>
  //node()
    [self::input or self::img or self::br
    or
     self::text()[normalize-space(translate(.,'&#xA0;',''))]
    ]
     /ancestor-or-self::node()
  //node()
     [not(count(.
              |
                //node() 
                   [self::input or self::img or self::br
                  or
                    self::text()[normalize-space(translate(.,'&#xA0;',''))]
                   ]
                    /ancestor-or-self::node()
                )
        =
         count(//node()
                  [self::input or self::img or self::br
                 or
                   self::text()[normalize-space(translate(.,'&#xA0;',''))]
                  ]
                   /ancestor-or-self::node()
               )
         )
     ]
要删除所有这些节点,我们只需要从
$vAllEmpty

让我们将所有此类“顶部节点”的集合表示为:
$vtopenty

$vTopEmpty
可以使用以下XPath 2.0表达式从
$vAllEmpty
表达:

   //*[not(normalize-space((translate(., '&#xA0;', ''))))
     and
       not(descendant-or-self::*
             [self::img or self::input or self::br])

      ]
$vAllEmpty[not(ancestor::* intersect $vAllEmpty)]
$vAllEmpty[not(ancestor::*[count(.|$vAllEmpty) = count($vAllEmpty)])]
这将从
$vAllEmpty
中选择那些没有任何祖先元素的节点,这些祖先元素也在
$vAllEmpty

最后一个XPath表达式具有等效的XPath 1.0表达式:

   //*[not(normalize-space((translate(., '&#xA0;', ''))))
     and
       not(descendant-or-self::*
             [self::img or self::input or self::br])

      ]
$vAllEmpty[not(ancestor::* intersect $vAllEmpty)]
$vAllEmpty[not(ancestor::*[count(.|$vAllEmpty) = count($vAllEmpty)])]
现在,我们用上面定义的扩展XPath表达式替换最后一个表达式
$vAllEmpty
,这就是我们获得最终表达式的方式,该表达式仅选择“要删除的顶部节点”:

//*[not(normalize-space((translate(., '&#xA0;', ''))))
  and
    not(descendant-or-self::*[self::img or self::input or self::br])
    ]
     [not(ancestor::*
             [count(.| //*[not(normalize-space((translate(., '&#xA0;', ''))))
                         and
                           not(descendant-or-self::*
                                  [self::img or self::input or self::br])
                          ]
                    )
             =
              count(//*[not(normalize-space((translate(., '&#xA0;', ''))))
                      and
                        not(descendant-or-self::*
                                 [self::img or self::input or self::br])
                        ]
                   )
              ]
          )
     ]
使用变量进行基于XSLT-2.0的简短验证

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match=
 "*[not(string(translate(., '&#xA0;', '')))
  and
    not(descendant-or-self::*
          [self::img or self::input or self::br])]"/>
</xsl:stylesheet>
<html>
   <a href="http://example.com">good</a>
   <br/>
   <img src="http://example.com/logo.png"/>
</html>
//*[not(normalize-space((translate(., '&#xA0;', ''))))
  and
    not(descendant-or-self::*[self::img or self::input or self::br])
    ]
     [not(ancestor::*
             [count(.| //*[not(normalize-space((translate(., '&#xA0;', ''))))
                         and
                           not(descendant-or-self::*
                                  [self::img or self::input or self::br])
                          ]
                    )
             =
              count(//*[not(normalize-space((translate(., '&#xA0;', ''))))
                      and
                        not(descendant-or-self::*
                                 [self::img or self::input or self::br])
                        ]
                   )
              ]
          )
     ]
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match=
   "//*[not(normalize-space((translate(., '&#xA0;', ''))))
      and
        not(descendant-or-self::*[self::img or self::input or self::br])
       ]
        [not(ancestor::*
               [count(.| //*[not(normalize-space((translate(., '&#xA0;', ''))))
                           and
                             not(descendant-or-self::*
                                    [self::img or self::input or self::br])
                             ]
                      )
               =
                count(//*[not(normalize-space((translate(., '&#xA0;', ''))))
                        and
                          not(descendant-or-self::*
                                 [self::img or self::input or self::br])
                          ]
                      )
               ]
            )
        ]
 "/>
</xsl:stylesheet>
<html>
   <a href="http://example.com">good</a>
   <br/>
   <img src="http://example.com/logo.png"/>
</html>
<xsl:stylesheet version="2.0"
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     <xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
     <xsl:strip-space elements="*"/>

     <xsl:variable name="vAllEmpty" select=
      "//*[not(normalize-space((translate(., '&#xA0;', ''))))
         and
           not(descendant-or-self::*
                 [self::img or self::input or self::br])

          ]"/>

     <xsl:variable name="vTopEmpty" select=
     "$vAllEmpty[not(ancestor::* intersect $vAllEmpty)]"/>

     <xsl:template match="node()|@*">
      <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
      </xsl:copy>
     </xsl:template>

     <xsl:template match="*[. intersect $vTopEmpty]"/>
</xsl:stylesheet>
  //node()
    [self::input or self::img or self::br
    or
     self::text()[normalize-space(translate(.,'&#xA0;',''))]
    ]
     /ancestor-or-self::node()
  //node()
     [not(count(.
              |
                //node() 
                   [self::input or self::img or self::br
                  or
                    self::text()[normalize-space(translate(.,'&#xA0;',''))]
                   ]
                    /ancestor-or-self::node()
                )
        =
         count(//node()
                  [self::input or self::img or self::br
                 or
                   self::text()[normalize-space(translate(.,'&#xA0;',''))]
                  ]
                   /ancestor-or-self::node()
               )
         )
     ]

III.替代解决方案(可能需要“多次清理”)

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match=
 "*[not(string(translate(., '&#xA0;', '')))
  and
    not(descendant-or-self::*
          [self::img or self::input or self::br])]"/>
</xsl:stylesheet>
<html>
   <a href="http://example.com">good</a>
   <br/>
   <img src="http://example.com/logo.png"/>
</html>
//*[not(normalize-space((translate(., '&#xA0;', ''))))
  and
    not(descendant-or-self::*[self::img or self::input or self::br])
    ]
     [not(ancestor::*
             [count(.| //*[not(normalize-space((translate(., '&#xA0;', ''))))
                         and
                           not(descendant-or-self::*
                                  [self::img or self::input or self::br])
                          ]
                    )
             =
              count(//*[not(normalize-space((translate(., '&#xA0;', ''))))
                      and
                        not(descendant-or-self::*
                                 [self::img or self::input or self::br])
                        ]
                   )
              ]
          )
     ]
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match=
   "//*[not(normalize-space((translate(., '&#xA0;', ''))))
      and
        not(descendant-or-self::*[self::img or self::input or self::br])
       ]
        [not(ancestor::*
               [count(.| //*[not(normalize-space((translate(., '&#xA0;', ''))))
                           and
                             not(descendant-or-self::*
                                    [self::img or self::input or self::br])
                             ]
                      )
               =
                count(//*[not(normalize-space((translate(., '&#xA0;', ''))))
                        and
                          not(descendant-or-self::*
                                 [self::img or self::input or self::br])
                          ]
                      )
               ]
            )
        ]
 "/>
</xsl:stylesheet>
<html>
   <a href="http://example.com">good</a>
   <br/>
   <img src="http://example.com/logo.png"/>
</html>
<xsl:stylesheet version="2.0"
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     <xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
     <xsl:strip-space elements="*"/>

     <xsl:variable name="vAllEmpty" select=
      "//*[not(normalize-space((translate(., '&#xA0;', ''))))
         and
           not(descendant-or-self::*
                 [self::img or self::input or self::br])

          ]"/>

     <xsl:variable name="vTopEmpty" select=
     "$vAllEmpty[not(ancestor::* intersect $vAllEmpty)]"/>

     <xsl:template match="node()|@*">
      <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
      </xsl:copy>
     </xsl:template>

     <xsl:template match="*[. intersect $vTopEmpty]"/>
</xsl:stylesheet>
  //node()
    [self::input or self::img or self::br
    or
     self::text()[normalize-space(translate(.,'&#xA0;',''))]
    ]
     /ancestor-or-self::node()
  //node()
     [not(count(.
              |
                //node() 
                   [self::input or self::img or self::br
                  or
                    self::text()[normalize-space(translate(.,'&#xA0;',''))]
                   ]
                    /ancestor-or-self::node()
                )
        =
         count(//node()
                  [self::input or self::img or self::br
                 or
                   self::text()[normalize-space(translate(.,'&#xA0;',''))]
                  ]
                   /ancestor-or-self::node()
               )
         )
     ]
另一种方法不是尝试指定要删除的节点,而是指定要保留的节点——然后要删除的节点是所有节点和要保留的节点之间的设置差异

要保留的节点由此XPath表达式选择

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match=
 "*[not(string(translate(., '&#xA0;', '')))
  and
    not(descendant-or-self::*
          [self::img or self::input or self::br])]"/>
</xsl:stylesheet>
<html>
   <a href="http://example.com">good</a>
   <br/>
   <img src="http://example.com/logo.png"/>
</html>
//*[not(normalize-space((translate(., '&#xA0;', ''))))
  and
    not(descendant-or-self::*[self::img or self::input or self::br])
    ]
     [not(ancestor::*
             [count(.| //*[not(normalize-space((translate(., '&#xA0;', ''))))
                         and
                           not(descendant-or-self::*
                                  [self::img or self::input or self::br])
                          ]
                    )
             =
              count(//*[not(normalize-space((translate(., '&#xA0;', ''))))
                      and
                        not(descendant-or-self::*
                                 [self::img or self::input or self::br])
                        ]
                   )
              ]
          )
     ]
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match=
   "//*[not(normalize-space((translate(., '&#xA0;', ''))))
      and
        not(descendant-or-self::*[self::img or self::input or self::br])
       ]
        [not(ancestor::*
               [count(.| //*[not(normalize-space((translate(., '&#xA0;', ''))))
                           and
                             not(descendant-or-self::*
                                    [self::img or self::input or self::br])
                             ]
                      )
               =
                count(//*[not(normalize-space((translate(., '&#xA0;', ''))))
                        and
                          not(descendant-or-self::*
                                 [self::img or self::input or self::br])
                          ]
                      )
               ]
            )
        ]
 "/>
</xsl:stylesheet>
<html>
   <a href="http://example.com">good</a>
   <br/>
   <img src="http://example.com/logo.png"/>
</html>
<xsl:stylesheet version="2.0"
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     <xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
     <xsl:strip-space elements="*"/>

     <xsl:variable name="vAllEmpty" select=
      "//*[not(normalize-space((translate(., '&#xA0;', ''))))
         and
           not(descendant-or-self::*
                 [self::img or self::input or self::br])

          ]"/>

     <xsl:variable name="vTopEmpty" select=
     "$vAllEmpty[not(ancestor::* intersect $vAllEmpty)]"/>

     <xsl:template match="node()|@*">
      <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
      </xsl:copy>
     </xsl:template>

     <xsl:template match="*[. intersect $vTopEmpty]"/>
</xsl:stylesheet>
  //node()
    [self::input or self::img or self::br
    or
     self::text()[normalize-space(translate(.,'&#xA0;',''))]
    ]
     /ancestor-or-self::node()
  //node()
     [not(count(.
              |
                //node() 
                   [self::input or self::img or self::br
                  or
                    self::text()[normalize-space(translate(.,'&#xA0;',''))]
                   ]
                    /ancestor-or-self::node()
                )
        =
         count(//node()
                  [self::input or self::img or self::br
                 or
                   self::text()[normalize-space(translate(.,'&#xA0;',''))]
                  ]
                   /ancestor-or-self::node()
               )
         )
     ]
则要删除的节点为

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match=
 "*[not(string(translate(., '&#xA0;', '')))
  and
    not(descendant-or-self::*
          [self::img or self::input or self::br])]"/>
</xsl:stylesheet>
<html>
   <a href="http://example.com">good</a>
   <br/>
   <img src="http://example.com/logo.png"/>
</html>
//*[not(normalize-space((translate(., '&#xA0;', ''))))
  and
    not(descendant-or-self::*[self::img or self::input or self::br])
    ]
     [not(ancestor::*
             [count(.| //*[not(normalize-space((translate(., '&#xA0;', ''))))
                         and
                           not(descendant-or-self::*
                                  [self::img or self::input or self::br])
                          ]
                    )
             =
              count(//*[not(normalize-space((translate(., '&#xA0;', ''))))
                      and
                        not(descendant-or-self::*
                                 [self::img or self::input or self::br])
                        ]
                   )
              ]
          )
     ]
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match=
   "//*[not(normalize-space((translate(., '&#xA0;', ''))))
      and
        not(descendant-or-self::*[self::img or self::input or self::br])
       ]
        [not(ancestor::*
               [count(.| //*[not(normalize-space((translate(., '&#xA0;', ''))))
                           and
                             not(descendant-or-self::*
                                    [self::img or self::input or self::br])
                             ]
                      )
               =
                count(//*[not(normalize-space((translate(., '&#xA0;', ''))))
                        and
                          not(descendant-or-self::*
                                 [self::img or self::input or self::br])
                          ]
                      )
               ]
            )
        ]
 "/>
</xsl:stylesheet>
<html>
   <a href="http://example.com">good</a>
   <br/>
   <img src="http://example.com/logo.png"/>
</html>
<xsl:stylesheet version="2.0"
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     <xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
     <xsl:strip-space elements="*"/>

     <xsl:variable name="vAllEmpty" select=
      "//*[not(normalize-space((translate(., '&#xA0;', ''))))
         and
           not(descendant-or-self::*
                 [self::img or self::input or self::br])

          ]"/>

     <xsl:variable name="vTopEmpty" select=
     "$vAllEmpty[not(ancestor::* intersect $vAllEmpty)]"/>

     <xsl:template match="node()|@*">
      <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
      </xsl:copy>
     </xsl:template>

     <xsl:template match="*[. intersect $vTopEmpty]"/>
</xsl:stylesheet>
  //node()
    [self::input or self::img or self::br
    or
     self::text()[normalize-space(translate(.,'&#xA0;',''))]
    ]
     /ancestor-or-self::node()
  //node()
     [not(count(.
              |
                //node() 
                   [self::input or self::img or self::br
                  or
                    self::text()[normalize-space(translate(.,'&#xA0;',''))]
                   ]
                    /ancestor-or-self::node()
                )
        =
         count(//node()
                  [self::input or self::img or self::br
                 or
                   self::text()[normalize-space(translate(.,'&#xA0;',''))]
                  ]
                   /ancestor-or-self::node()
               )
         )
     ]

但是,请注意这些都是要删除的节点,而不仅仅是“要删除的顶部节点”。可以只表示“要删除的顶部节点”,但得到的表达式相当复杂。如果试图删除所有要删除的节点,则会出现错误,因为“要删除的顶级节点”的子节点按文档顺序跟随它们。

实现所需结果的最简单方法是在文本中使用正则表达式。注意:您必须多次使用这个表达式,因为它不是贪婪的,它只删除最低的空子节点,所以要删除所有空节点,我们必须多次调用正则表达式

以下是解决方案:

<?
$text = '<div class="empty">
    <div>&nbsp;</div>
    <div></div>
</div>
<a href="http://example.com">good</a>
<div>
    <p></p>
</div>
<br>
<img src="http://example.com/logo.png" />
<div></div>';

// recursive function
function recreplace($text)
{
    $restext = preg_replace("/<div(.*)?>((\s|&nbsp;)*|(\s|&nbsp;)*<p>(\s|&nbsp;)*<\/p>(\s|&nbsp;)*)*<\/div>/U", '', $text);
    if ($text != $restext) 
    {
        recreplace($restext);
    }
    else
    {
        return $restext;
    }
}

print recreplace($text);
?>

那么您想要文本节点,

,以及它们的祖先

您可以使用
//br
//img
获取所有br和img

可以使用
//text()
获取所有文本节点,使用
//text()[normalize-space()]
获取所有非空文本节点。(尽管您可能需要类似于
//text()[规范化空间(翻译(,'',))]
的东西来过滤
文本节点,如果您的xml解析器还没有这样做的话)

您可以获得所有具有
祖先或自我::*
的父母

所以得到的表达式是

//br/ancestor-or-self::* | //img/ancestor-or-self::* | //text()[normalize-space()]/ancestor-or-self::*
XPath 2中的缩写为:

(//br | //img | //text()[normalize-space()])/ancestor-or-self::*

两点:1)由于要删除的一些顶级节点实际上不是空的(它们有子节点,其中一个节点中有一个非中断空间实体,这在技术上是实际内容),因此您必须反复运行查询,直到没有剩余的节点为止(您似乎已经意识到)如果要删除大量深度嵌套的层,则计算成本可能非常高。2) 删除空节点并不一定总是安全的。它很容易破坏依赖这些元素来实现正确间距和浮动的CSS规则。@DaveRandom,很好,我没有考虑CSS规则。然而,对于我的用例来说,这不是一个问题——额外的计算时间也不是问题。这些DOM结构不用于向用户显示。while循环似乎正在停止,但仍有DOM元素需要清除。它目前正在生成什么输出?哪个