在PHP中,为什么我必须在刚删除了style属性的DOMNode上执行removeChild方法两次?
我有一个PHP脚本,可以从HTML文件中删除空段落。空段落是那些没有文本内容的在PHP中,为什么我必须在刚删除了style属性的DOMNode上执行removeChild方法两次?,php,html,dom,Php,Html,Dom,我有一个PHP脚本,可以从HTML文件中删除空段落。空段落是那些没有文本内容的元素 包含空段落的HTML文件: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <!-- This page is used with remove_empty_paragraphs.php script. This page contains empty paragraphs. The script remov
元素
包含空段落的HTML文件:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<!--
This page is used with remove_empty_paragraphs.php script.
This page contains empty paragraphs. The script removes the empty paragraphs and
writes a new HTML file.
-->
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title></title>
</head>
<body>
<p>This is a paragraph.</p>
<!-- Below is an empty paragraph. -->
<p><span></span></p>
<p>This is another paragraph.</p>
<!-- Below is another empty paragraph. -->
<p class=MsoNormal><b></b></p>
<p style=''></p>
<p>
<span lang=EN-US style='font-size:5.0pt;color:navy;mso-ansi-language:EN-`US'></span>
</p>
</body>
</html>
$html = new DOMDocument("1.0", "UTF-8");
$html->loadHTMLFile("HTML File with Empty Paragraphs.html");
$pars = $html->getElementsByTagName("p");
/* removeStyleAttribute foreach-loop */
foreach ($pars as $par) {
if ($par->hasAttribute("style")) {
$par->removeAttribute("style");
}
}
/* removeChild foreach-loop */
foreach ($pars as $par) {
if ($par->textContent == "") {
$par->parentNode->removeChild($par);
}
}
$html->saveHTMLFile("HTML File WithOut Empty Paragraphs.html");
$html = new DOMDocument("1.0", "UTF-8");
$html->loadHTMLFile("HTML File with Empty Paragraphs.html");
$pars = $html->getElementsByTagName("p");
/* removeStyleAttribute foreach-loop */
foreach ($pars as $par) {
if ($par->hasAttribute("style")) {
$par->removeAttribute("style");
}
}
/* First removeChild foreach-loop */
foreach ($pars as $par) {
if ($par->textContent == "") {
$par->parentNode->removeChild($par);
}
}
/* Second removeChild foreach-loop, identical to the first removeChild foreach-loop */
foreach ($pars as $par) {
if ($par->textContent == "") {
$par->parentNode->removeChild($par);
}
}
$html->saveHTMLFile("HTML File WithOut Empty Paragraphs.html");
$html = new DOMDocument("1.0", "UTF-8");
$html->loadHTMLFile("HTML File with Empty Paragraphs.html");
$pars = $html->getElementsByTagName("p");
foreach ($pars as $par) {
if ($par->textContent == "") {
if ($par->hasAttribute("style")){
$par->removeAttribute("style");
}
$par->parentNode->removeChild($par);
}
}
$html->saveHTMLFile("HTML File WithOut Empty Paragraphs.html");
这将成功实现:
- 删除没有样式属性的空段落
- 使用 样式属性
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<!--
This page is used with remove_empty_paragraphs.php script.
This page contains empty paragraphs. The script removes the empty paragraphs and
writes a new HTML file.
-->
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title></title>
</head>
<body>
<p>This is a paragraph.</p>
<!-- Below is an empty paragraph. -->
<p><span></span></p>
<p>This is another paragraph.</p>
<!-- Below is another empty paragraph. -->
<p class=MsoNormal><b></b></p>
<p style=''></p>
<p>
<span lang=EN-US style='font-size:5.0pt;color:navy;mso-ansi-language:EN-`US'></span>
</p>
</body>
</html>
$html = new DOMDocument("1.0", "UTF-8");
$html->loadHTMLFile("HTML File with Empty Paragraphs.html");
$pars = $html->getElementsByTagName("p");
/* removeStyleAttribute foreach-loop */
foreach ($pars as $par) {
if ($par->hasAttribute("style")) {
$par->removeAttribute("style");
}
}
/* removeChild foreach-loop */
foreach ($pars as $par) {
if ($par->textContent == "") {
$par->parentNode->removeChild($par);
}
}
$html->saveHTMLFile("HTML File WithOut Empty Paragraphs.html");
$html = new DOMDocument("1.0", "UTF-8");
$html->loadHTMLFile("HTML File with Empty Paragraphs.html");
$pars = $html->getElementsByTagName("p");
/* removeStyleAttribute foreach-loop */
foreach ($pars as $par) {
if ($par->hasAttribute("style")) {
$par->removeAttribute("style");
}
}
/* First removeChild foreach-loop */
foreach ($pars as $par) {
if ($par->textContent == "") {
$par->parentNode->removeChild($par);
}
}
/* Second removeChild foreach-loop, identical to the first removeChild foreach-loop */
foreach ($pars as $par) {
if ($par->textContent == "") {
$par->parentNode->removeChild($par);
}
}
$html->saveHTMLFile("HTML File WithOut Empty Paragraphs.html");
$html = new DOMDocument("1.0", "UTF-8");
$html->loadHTMLFile("HTML File with Empty Paragraphs.html");
$pars = $html->getElementsByTagName("p");
foreach ($pars as $par) {
if ($par->textContent == "") {
if ($par->hasAttribute("style")){
$par->removeAttribute("style");
}
$par->parentNode->removeChild($par);
}
}
$html->saveHTMLFile("HTML File WithOut Empty Paragraphs.html");
在以下方面成功:
- 从具有样式属性的空段落中删除样式属性
- 删除不具有样式属性的空段落
- 删除从中删除样式属性的空段落 删除
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<!--
This page is used with remove_empty_paragraphs.php script.
This page contains empty paragraphs. The script removes the empty paragraphs and
writes a new HTML file.
-->
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title></title>
</head>
<body>
<p>This is a paragraph.</p>
<!-- Below is an empty paragraph. -->
<p><span></span></p>
<p>This is another paragraph.</p>
<!-- Below is another empty paragraph. -->
<p class=MsoNormal><b></b></p>
<p style=''></p>
<p>
<span lang=EN-US style='font-size:5.0pt;color:navy;mso-ansi-language:EN-`US'></span>
</p>
</body>
</html>
$html = new DOMDocument("1.0", "UTF-8");
$html->loadHTMLFile("HTML File with Empty Paragraphs.html");
$pars = $html->getElementsByTagName("p");
/* removeStyleAttribute foreach-loop */
foreach ($pars as $par) {
if ($par->hasAttribute("style")) {
$par->removeAttribute("style");
}
}
/* removeChild foreach-loop */
foreach ($pars as $par) {
if ($par->textContent == "") {
$par->parentNode->removeChild($par);
}
}
$html->saveHTMLFile("HTML File WithOut Empty Paragraphs.html");
$html = new DOMDocument("1.0", "UTF-8");
$html->loadHTMLFile("HTML File with Empty Paragraphs.html");
$pars = $html->getElementsByTagName("p");
/* removeStyleAttribute foreach-loop */
foreach ($pars as $par) {
if ($par->hasAttribute("style")) {
$par->removeAttribute("style");
}
}
/* First removeChild foreach-loop */
foreach ($pars as $par) {
if ($par->textContent == "") {
$par->parentNode->removeChild($par);
}
}
/* Second removeChild foreach-loop, identical to the first removeChild foreach-loop */
foreach ($pars as $par) {
if ($par->textContent == "") {
$par->parentNode->removeChild($par);
}
}
$html->saveHTMLFile("HTML File WithOut Empty Paragraphs.html");
$html = new DOMDocument("1.0", "UTF-8");
$html->loadHTMLFile("HTML File with Empty Paragraphs.html");
$pars = $html->getElementsByTagName("p");
foreach ($pars as $par) {
if ($par->textContent == "") {
if ($par->hasAttribute("style")){
$par->removeAttribute("style");
}
$par->parentNode->removeChild($par);
}
}
$html->saveHTMLFile("HTML File WithOut Empty Paragraphs.html");
这工作非常好,但有两个相同的循环,一个接一个,这很奇怪
我也试着用一个循环来处理所有的事情
第四次尝试:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<!--
This page is used with remove_empty_paragraphs.php script.
This page contains empty paragraphs. The script removes the empty paragraphs and
writes a new HTML file.
-->
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title></title>
</head>
<body>
<p>This is a paragraph.</p>
<!-- Below is an empty paragraph. -->
<p><span></span></p>
<p>This is another paragraph.</p>
<!-- Below is another empty paragraph. -->
<p class=MsoNormal><b></b></p>
<p style=''></p>
<p>
<span lang=EN-US style='font-size:5.0pt;color:navy;mso-ansi-language:EN-`US'></span>
</p>
</body>
</html>
$html = new DOMDocument("1.0", "UTF-8");
$html->loadHTMLFile("HTML File with Empty Paragraphs.html");
$pars = $html->getElementsByTagName("p");
/* removeStyleAttribute foreach-loop */
foreach ($pars as $par) {
if ($par->hasAttribute("style")) {
$par->removeAttribute("style");
}
}
/* removeChild foreach-loop */
foreach ($pars as $par) {
if ($par->textContent == "") {
$par->parentNode->removeChild($par);
}
}
$html->saveHTMLFile("HTML File WithOut Empty Paragraphs.html");
$html = new DOMDocument("1.0", "UTF-8");
$html->loadHTMLFile("HTML File with Empty Paragraphs.html");
$pars = $html->getElementsByTagName("p");
/* removeStyleAttribute foreach-loop */
foreach ($pars as $par) {
if ($par->hasAttribute("style")) {
$par->removeAttribute("style");
}
}
/* First removeChild foreach-loop */
foreach ($pars as $par) {
if ($par->textContent == "") {
$par->parentNode->removeChild($par);
}
}
/* Second removeChild foreach-loop, identical to the first removeChild foreach-loop */
foreach ($pars as $par) {
if ($par->textContent == "") {
$par->parentNode->removeChild($par);
}
}
$html->saveHTMLFile("HTML File WithOut Empty Paragraphs.html");
$html = new DOMDocument("1.0", "UTF-8");
$html->loadHTMLFile("HTML File with Empty Paragraphs.html");
$pars = $html->getElementsByTagName("p");
foreach ($pars as $par) {
if ($par->textContent == "") {
if ($par->hasAttribute("style")){
$par->removeAttribute("style");
}
$par->parentNode->removeChild($par);
}
}
$html->saveHTMLFile("HTML File WithOut Empty Paragraphs.html");
这将成功实现:
- 删除不带空格的空段落 样式属性
- 从包含样式属性的空段落中删除样式属性
- 删除带有“样式”属性的空段落
- 就像托马拉克说的,这可能与空白有关。
尝试禁用“保留空白”:
嗯,我是新来的,如何将我的答案作为注释而不是答案发送?getElementsByTagName返回的列表是动态的:从文档中删除节点也会将它们从列表中删除。因为foreach不知道列表发生了变化,所以它会很高兴地移动到下一个项目——实际上是下两个项目,因为DOMNodeList被重新排列了。一些标签被直接跳过了 解决方案:使用for循环(使用$pars->item(X)和$pars->length)代替foreach,但仅在未删除节点时使用增量。(如果删除了一个,则始终递增并回溯。) 另外:最后一个(带大的)没有被删除,因为在。使用trim()将其清除
另请参见中的我的回复。不是您问题的解决方案(或者可能是),但可能是值得研究的问题。我想我会提起的,以防你从没听说过!您是否注意到
if($par->hasaAttribute(“syle”){
?“syle”->“style”的输入错误。对我来说,您的设置(暂定):-可能您遗漏了其他内容?您是否可以发布一个重复失败的最小HTML示例?(例如,请注意textContent
可能包含空格,您不需要检查。)只在第一个代码部分对我也有效。你运行的是什么版本的PHP?@Andy E,我修正了打字错误,谢谢。但仍然不起作用。@Brik,没有一段有空格。我使用的是我用Netbeans制作的一个小测试文件,