在PHP中,为什么我必须在刚删除了style属性的DOMNode上执行removeChild方法两次?

在PHP中,为什么我必须在刚删除了style属性的DOMNode上执行removeChild方法两次?,php,html,dom,Php,Html,Dom,我有一个PHP脚本,可以从HTML文件中删除空段落。空段落是那些没有文本内容的元素 包含空段落的HTML文件: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <!-- This page is used with remove_empty_paragraphs.php script. This page contains empty paragraphs. The script remov

我有一个PHP脚本,可以从HTML文件中删除空段落。空段落是那些没有文本内容的

元素

包含空段落的HTML文件:

    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<!--
This page is used with remove_empty_paragraphs.php script.
This page contains empty paragraphs. The script removes the empty paragraphs and
writes a new HTML file.
-->
<html>
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
        <title></title>
    </head>
    <body>
        <p>This is a paragraph.</p>
        <!-- Below is an empty paragraph. -->
        <p><span></span></p>
        <p>This is another paragraph.</p>
        <!-- Below is another empty paragraph. -->
        <p class=MsoNormal><b></b></p>
        <p style=''></p>
        <p>
            <span lang=EN-US style='font-size:5.0pt;color:navy;mso-ansi-language:EN-`US'></span>
        </p>
    </body>
</html>
$html = new DOMDocument("1.0", "UTF-8");
$html->loadHTMLFile("HTML File with Empty Paragraphs.html");
$pars = $html->getElementsByTagName("p");    

/* removeStyleAttribute foreach-loop */
foreach ($pars as $par) {
    if ($par->hasAttribute("style")) {
        $par->removeAttribute("style");
    }
}

/* removeChild foreach-loop */
foreach ($pars as $par) {
    if ($par->textContent == "") {
            $par->parentNode->removeChild($par);
    }
}

$html->saveHTMLFile("HTML File WithOut Empty Paragraphs.html");
$html = new DOMDocument("1.0", "UTF-8");
$html->loadHTMLFile("HTML File with Empty Paragraphs.html");
$pars = $html->getElementsByTagName("p");

/* removeStyleAttribute foreach-loop */
foreach ($pars as $par) {
    if ($par->hasAttribute("style")) {
        $par->removeAttribute("style");
    }
}

/* First removeChild foreach-loop */
foreach ($pars as $par) {
    if ($par->textContent == "") {
        $par->parentNode->removeChild($par);
    }
}

/* Second removeChild foreach-loop, identical to the first removeChild foreach-loop */
foreach ($pars as $par) {
    if ($par->textContent == "") {
        $par->parentNode->removeChild($par);
    }
}

$html->saveHTMLFile("HTML File WithOut Empty Paragraphs.html");
$html = new DOMDocument("1.0", "UTF-8");
$html->loadHTMLFile("HTML File with Empty Paragraphs.html");
$pars = $html->getElementsByTagName("p");  

foreach ($pars as $par) {
    if ($par->textContent == "") {
        if ($par->hasAttribute("style")){
            $par->removeAttribute("style");
        }
        $par->parentNode->removeChild($par);
    }
}

$html->saveHTMLFile("HTML File WithOut Empty Paragraphs.html");
这将成功实现:

  • 删除没有样式属性的空段落
但未能执行以下操作:

  • 使用 样式属性
因此,我在removeChild foreach循环之前插入removeStyleAttribute foreach循环。(我不介意删除非空段落的样式属性。)

第二次尝试:

    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<!--
This page is used with remove_empty_paragraphs.php script.
This page contains empty paragraphs. The script removes the empty paragraphs and
writes a new HTML file.
-->
<html>
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
        <title></title>
    </head>
    <body>
        <p>This is a paragraph.</p>
        <!-- Below is an empty paragraph. -->
        <p><span></span></p>
        <p>This is another paragraph.</p>
        <!-- Below is another empty paragraph. -->
        <p class=MsoNormal><b></b></p>
        <p style=''></p>
        <p>
            <span lang=EN-US style='font-size:5.0pt;color:navy;mso-ansi-language:EN-`US'></span>
        </p>
    </body>
</html>
$html = new DOMDocument("1.0", "UTF-8");
$html->loadHTMLFile("HTML File with Empty Paragraphs.html");
$pars = $html->getElementsByTagName("p");    

/* removeStyleAttribute foreach-loop */
foreach ($pars as $par) {
    if ($par->hasAttribute("style")) {
        $par->removeAttribute("style");
    }
}

/* removeChild foreach-loop */
foreach ($pars as $par) {
    if ($par->textContent == "") {
            $par->parentNode->removeChild($par);
    }
}

$html->saveHTMLFile("HTML File WithOut Empty Paragraphs.html");
$html = new DOMDocument("1.0", "UTF-8");
$html->loadHTMLFile("HTML File with Empty Paragraphs.html");
$pars = $html->getElementsByTagName("p");

/* removeStyleAttribute foreach-loop */
foreach ($pars as $par) {
    if ($par->hasAttribute("style")) {
        $par->removeAttribute("style");
    }
}

/* First removeChild foreach-loop */
foreach ($pars as $par) {
    if ($par->textContent == "") {
        $par->parentNode->removeChild($par);
    }
}

/* Second removeChild foreach-loop, identical to the first removeChild foreach-loop */
foreach ($pars as $par) {
    if ($par->textContent == "") {
        $par->parentNode->removeChild($par);
    }
}

$html->saveHTMLFile("HTML File WithOut Empty Paragraphs.html");
$html = new DOMDocument("1.0", "UTF-8");
$html->loadHTMLFile("HTML File with Empty Paragraphs.html");
$pars = $html->getElementsByTagName("p");  

foreach ($pars as $par) {
    if ($par->textContent == "") {
        if ($par->hasAttribute("style")){
            $par->removeAttribute("style");
        }
        $par->parentNode->removeChild($par);
    }
}

$html->saveHTMLFile("HTML File WithOut Empty Paragraphs.html");
在以下方面成功

  • 从具有样式属性的空段落中删除样式属性
  • 删除不具有样式属性的空段落
但是失败了至:

  • 删除从中删除样式属性的空段落 删除
所以我必须有两个removeChild foreach循环,一个接一个

第三次尝试:

    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<!--
This page is used with remove_empty_paragraphs.php script.
This page contains empty paragraphs. The script removes the empty paragraphs and
writes a new HTML file.
-->
<html>
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
        <title></title>
    </head>
    <body>
        <p>This is a paragraph.</p>
        <!-- Below is an empty paragraph. -->
        <p><span></span></p>
        <p>This is another paragraph.</p>
        <!-- Below is another empty paragraph. -->
        <p class=MsoNormal><b></b></p>
        <p style=''></p>
        <p>
            <span lang=EN-US style='font-size:5.0pt;color:navy;mso-ansi-language:EN-`US'></span>
        </p>
    </body>
</html>
$html = new DOMDocument("1.0", "UTF-8");
$html->loadHTMLFile("HTML File with Empty Paragraphs.html");
$pars = $html->getElementsByTagName("p");    

/* removeStyleAttribute foreach-loop */
foreach ($pars as $par) {
    if ($par->hasAttribute("style")) {
        $par->removeAttribute("style");
    }
}

/* removeChild foreach-loop */
foreach ($pars as $par) {
    if ($par->textContent == "") {
            $par->parentNode->removeChild($par);
    }
}

$html->saveHTMLFile("HTML File WithOut Empty Paragraphs.html");
$html = new DOMDocument("1.0", "UTF-8");
$html->loadHTMLFile("HTML File with Empty Paragraphs.html");
$pars = $html->getElementsByTagName("p");

/* removeStyleAttribute foreach-loop */
foreach ($pars as $par) {
    if ($par->hasAttribute("style")) {
        $par->removeAttribute("style");
    }
}

/* First removeChild foreach-loop */
foreach ($pars as $par) {
    if ($par->textContent == "") {
        $par->parentNode->removeChild($par);
    }
}

/* Second removeChild foreach-loop, identical to the first removeChild foreach-loop */
foreach ($pars as $par) {
    if ($par->textContent == "") {
        $par->parentNode->removeChild($par);
    }
}

$html->saveHTMLFile("HTML File WithOut Empty Paragraphs.html");
$html = new DOMDocument("1.0", "UTF-8");
$html->loadHTMLFile("HTML File with Empty Paragraphs.html");
$pars = $html->getElementsByTagName("p");  

foreach ($pars as $par) {
    if ($par->textContent == "") {
        if ($par->hasAttribute("style")){
            $par->removeAttribute("style");
        }
        $par->parentNode->removeChild($par);
    }
}

$html->saveHTMLFile("HTML File WithOut Empty Paragraphs.html");
这工作非常好,但有两个相同的循环,一个接一个,这很奇怪

我也试着用一个循环来处理所有的事情

第四次尝试:

    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<!--
This page is used with remove_empty_paragraphs.php script.
This page contains empty paragraphs. The script removes the empty paragraphs and
writes a new HTML file.
-->
<html>
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
        <title></title>
    </head>
    <body>
        <p>This is a paragraph.</p>
        <!-- Below is an empty paragraph. -->
        <p><span></span></p>
        <p>This is another paragraph.</p>
        <!-- Below is another empty paragraph. -->
        <p class=MsoNormal><b></b></p>
        <p style=''></p>
        <p>
            <span lang=EN-US style='font-size:5.0pt;color:navy;mso-ansi-language:EN-`US'></span>
        </p>
    </body>
</html>
$html = new DOMDocument("1.0", "UTF-8");
$html->loadHTMLFile("HTML File with Empty Paragraphs.html");
$pars = $html->getElementsByTagName("p");    

/* removeStyleAttribute foreach-loop */
foreach ($pars as $par) {
    if ($par->hasAttribute("style")) {
        $par->removeAttribute("style");
    }
}

/* removeChild foreach-loop */
foreach ($pars as $par) {
    if ($par->textContent == "") {
            $par->parentNode->removeChild($par);
    }
}

$html->saveHTMLFile("HTML File WithOut Empty Paragraphs.html");
$html = new DOMDocument("1.0", "UTF-8");
$html->loadHTMLFile("HTML File with Empty Paragraphs.html");
$pars = $html->getElementsByTagName("p");

/* removeStyleAttribute foreach-loop */
foreach ($pars as $par) {
    if ($par->hasAttribute("style")) {
        $par->removeAttribute("style");
    }
}

/* First removeChild foreach-loop */
foreach ($pars as $par) {
    if ($par->textContent == "") {
        $par->parentNode->removeChild($par);
    }
}

/* Second removeChild foreach-loop, identical to the first removeChild foreach-loop */
foreach ($pars as $par) {
    if ($par->textContent == "") {
        $par->parentNode->removeChild($par);
    }
}

$html->saveHTMLFile("HTML File WithOut Empty Paragraphs.html");
$html = new DOMDocument("1.0", "UTF-8");
$html->loadHTMLFile("HTML File with Empty Paragraphs.html");
$pars = $html->getElementsByTagName("p");  

foreach ($pars as $par) {
    if ($par->textContent == "") {
        if ($par->hasAttribute("style")){
            $par->removeAttribute("style");
        }
        $par->parentNode->removeChild($par);
    }
}

$html->saveHTMLFile("HTML File WithOut Empty Paragraphs.html");
这将成功实现:

  • 删除不带空格的空段落 样式属性
但未能执行以下操作:

  • 从包含样式属性的空段落中删除样式属性
  • 删除带有“样式”属性的空段落

    • 就像托马拉克说的,这可能与空白有关。 尝试禁用“保留空白”:


      嗯,我是新来的,如何将我的答案作为注释而不是答案发送?

      getElementsByTagName返回的列表是动态的:从文档中删除节点也会将它们从列表中删除。因为foreach不知道列表发生了变化,所以它会很高兴地移动到下一个项目——实际上是下两个项目,因为DOMNodeList被重新排列了。一些标签被直接跳过了

      解决方案:使用for循环(使用$pars->item(X)和$pars->length)代替foreach,但仅在未删除节点时使用增量。(如果删除了一个,则始终递增并回溯。)

      另外:最后一个(带大的)没有被删除,因为在。使用trim()将其清除


      另请参见中的我的回复。

      不是您问题的解决方案(或者可能是),但可能是值得研究的问题。我想我会提起的,以防你从没听说过!您是否注意到
      if($par->hasaAttribute(“syle”){
      ?“syle”->“style”的输入错误。对我来说,您的设置(暂定):-可能您遗漏了其他内容?您是否可以发布一个重复失败的最小HTML示例?(例如,请注意
      textContent
      可能包含空格,您不需要检查。)只在第一个代码部分对我也有效。你运行的是什么版本的PHP?@Andy E,我修正了打字错误,谢谢。但仍然不起作用。@Brik,没有一段有空格。我使用的是我用Netbeans制作的一个小测试文件,