Php 如何删除空html标记（包含空格和/或其html代码）_Php_Html_Regex_Preg Replace

Php 如何删除空html标记（包含空格和/或其html代码）

php html regex

Php 如何删除空html标记（包含空格和/或其html代码）,php,html,regex,preg-replace,Php,Html,Regex,Preg Replace,preg_replace需要一个正则表达式 “另一个问题”中没有回答此问题，因为并非所有要删除的标记都是空的我不仅要从HTML结构中删除空标记，还要删除包含换行符、空格和/或其HTML代码的标记可能的代码是： &thinsp&ensp&emsp 删除匹配标记之前： <div> <h1>This is a html structure.</h1> <p>This is not empty.

preg_replace需要一个正则表达式

“另一个问题”中没有回答此问题，因为并非所有要删除的标记都是空的

我不仅要从HTML结构中删除空标记，还要删除包含换行符、空格和/或其HTML代码的标记

可能的代码是：

&thinsp&ensp&emsp

删除匹配标记之前：

<div> 
  <h1>This is a html structure.</h1> 
  <p>This is not empty.</p> 
  <p></p> 
  <p><br /></p>
  <p> <br /> &;thinsp;</p>
  <p>&nbsp;</p> 
  <p> &nbsp; </p> 
</div>

<div> 
  <h1>This is a html structure.</h1> 
  <p>This is not empty.</p> 
</div>


这是一个html结构。
这不是空的。




&；thinsp

删除匹配标记后：

<div> 
  <h1>This is a html structure.</h1> 
  <p>This is not empty.</p> 
  <p></p> 
  <p><br /></p>
  <p> <br /> &;thinsp;</p>
  <p>&nbsp;</p> 
  <p> &nbsp; </p> 
</div>

<div> 
  <h1>This is a html structure.</h1> 
  <p>This is not empty.</p> 
</div>


这是一个html结构。
这不是空的。

您可以使用以下功能：

<([^>\s]+)[^>]*>(?:\s*(?:<br \/>|&nbsp;|&thinsp;|&ensp;|&emsp;|&#8201;|&#8194;|&#8195;)\s*)*<\/\1>

\s]+）[^>]*>（？：\s*（？：| | |＆thinsp；&emsp；&emsp；&8201；&8194；&8195；）\s*）*

并替换为

”

（空字符串）

看

注意：这也适用于带有属性的空html标记。

使用它使用以下功能：

function cleaning($string, $tidyConfig = null) {
    $out = array ();
    $config = array (
            'indent' => true,
            'show-body-only' => false,
            'clean' => true,
            'output-xhtml' => true,
            'preserve-entities' => true 
    );
    if ($tidyConfig == null) {
        $tidyConfig = &$config;
    }
    $tidy = new tidy ();
    $out ['full'] = $tidy->repairString ( $string, $tidyConfig, 'UTF8' );
    unset ( $tidy );
    unset ( $tidyConfig );
    $out ['body'] = preg_replace ( "/.*<body[^>]*>|<\/body>.*/si", "", $out ['full'] );
    $out ['style'] = '<style type="text/css">' . preg_replace ( "/.*<style[^>]*>|<\/style>.*/si", "", $out ['full'] ) . '</style>';
    return ($out);
}

函数清理（$string，$tidyConfig=null）{
$out=array（）；
$config=array(
“缩进”=>true，
“仅显示正文”=>false，
“干净”=>正确，
“输出xhtml”=>true，
“保留实体”=>true
);
如果（$tidyConfig==null）{
$tidyConfig=&$config；
}
$tidy=新的tidy（）；
$out['full']=$tidy->repairString（$string，$tidyConfig，'UTF8'）；
未结算（美元）；
未设置（$tidyConfig）；
$out['body']=preg_replace（“/.*>.*.*/si“，”，$out['full']）；
$out['style']='.preg_replace（“/.*>.*.*.*/si）”，“，$out['full']）”；
报税表(元);；
}

我不太擅长，但是，试试这个

\<.*\>\s*\&.*sp;\s*\<\/.*\>|\<.*\>\s*\<\s*br\s*\/\>\s*\&.*sp;\s*\<\/.*\>|\<.*\>\s*\&.*sp;\s*\<\s*br\s*\/\>\<\/.*\>

\\s*\&sp\s*\\\\s*\\s*\&sp\s*\\\\\s*\&.*sp\*\\

基本匹配

包含HTML空间元素的标记，或
在标记中的HTML空格元素之前出现中断的标记
在标记中的HTML空格元素之后出现中断的标记

可能重复Hi，我会更新您的测试，如果在标签中添加空白，则测试失败。这是与的链接，另一个是与调整的链接。在关闭捕获组之前，您刚刚错过了此

 ；）\这是一个很好的答案，但为了处理这两种情况，应该更新到：
和
以排除iframe、canvas等标记pregôreplace（'~*>（？：\s*（？：ôthinsp；&ensp；&emsp；&emsp；&1241；&12594;&）&“&12594;，$html*”），/html*）