Php 检查标题标记和内部编号列表级别的字符串_Php_Regex_Html Parsing

Php 检查标题标记和内部编号列表级别的字符串

php regex

Php 检查标题标记和内部编号列表级别的字符串,php,regex,html-parsing,Php,Regex,Html Parsing,我需要更正带有错误标题标记和缺少p标记的字符串： <h3>1. Title</h3> Text <h3>1.1 Subtitle</h3> Text <h3>1.2. Subtitle</h3> 因此，这会添加缺少的p标记，但我不知道如何处理标题标记和第二层的可选缺少点。尝试以下方法： $lines = explode(PHP_EOL, $text); foreach ($lines as $line) { i

我需要更正带有错误标题标记和缺少p标记的字符串：

<h3>1. Title</h3>
Text
<h3>1.1 Subtitle</h3>
Text
<h3>1.2. Subtitle</h3>

因此，这会添加缺少的p标记，但我不知道如何处理标题标记和第二层的可选缺少点。

尝试以下方法：

 $lines = explode(PHP_EOL, $text);
 foreach ($lines as $line) {
    if(strpos($line,"<h") === false) $line = '<p>'.$line.'</p>';
    $output = $output.$line;
 }

$lines=explode（PHP\u EOL，$text）；
foreach（$line作为$line）{
if（strpos（$line，这将使用正则表达式获取不同的部分，并根据编号确定要使用的标题级别（h2
表示1.
，h3
表示1.2
等）。如果您正在解析的HTML真的像您的示例一样简单，这将起作用。如果不是，我强烈建议您查看解析器
$html = <<<EOS
<h3>1. Title</h3>
Text
<h3>1.1 Subtitle</h3>
Text
<h3>1.2. Subtitle</h3>
Text
EOS;

$lines = explode(PHP_EOL, $html);

foreach ($lines as $line) {
    if (preg_match('/^<(\w.*?)>([\d\.]*)(.*?)</', $line, $matches)) {
        $tag    = $matches[1]; // "h3"
        $number = $matches[2]; // "1.2"
        $title  = $matches[3]; // "Subtitle"

        if ($tag == 'h3') {
            $level = preg_match_all('/\d+/', $number) + 1;
            $tag = 'h' . $level;
            if (substr($number, -1, 1) != '.')
                $number .= '.';

            $line = "<$tag>$number$title</$tag>";
        }
    }
    else {
        $line = "<p>$line</p>";
    }
    echo $line, PHP_EOL;
}

$html=这个怎么样
$text = '<h3>1. Title</h3>
         Text 
         <h3>1.1 Subtitle</h3>
         Text
         <h3>1.2. Subtitle</h3>';
$lines = explode(PHP_EOL, $text);

$lines[0] = str_replace('h3','h2',$lines[0]); // Need to replace h3 to h2   only on First node
// replace a array of string
$search_str = array('.1 ', '.2 ');
$replace_str = array('.1. ', '.2. ');

foreach($lines as $line){
    if(!strchr($line,"<")){
       $line = '<p>'.$line.'</p>';
    }
$line = str_replace($search_str, $replace_str, $line);
print $line;
}

$text='1.标题
正文
1.1副标题
正文
1.2.副标题“；
$lines=explode（PHP\u EOL，$text）；
$lines[0]=str_replace（'h3'，'h2'，$lines[0]）；//只需在第一个节点上将h3替换为h2
//替换字符串数组
$search_str=数组（'.1'，'.2'）；
$replace_str=array（'.1'，'.2'）；
foreach（$line作为$line）{
如果（！strhr（$line，）一般来说，html和正则表达式不能很好地混合。我建议尝试找到一个不包含正则表达式的解决方案。我想检查行的开头是否匹配1.
或1.1，这意味着我必须检查开头是否有标题标记，后面是数字、点和空格或其他数字……你不应该这样做与==，因为的位置
$lines = explode(PHP_EOL, $text);
foreach ($lines as $key => $line) 
{ 
   if($key%2!=0) $line = '<p>'.$line.'</p>';
   $output = $output.$line;

}

$html = <<<EOS
<h3>1. Title</h3>
Text
<h3>1.1 Subtitle</h3>
Text
<h3>1.2. Subtitle</h3>
Text
EOS;

$lines = explode(PHP_EOL, $html);

foreach ($lines as $line) {
    if (preg_match('/^<(\w.*?)>([\d\.]*)(.*?)</', $line, $matches)) {
        $tag    = $matches[1]; // "h3"
        $number = $matches[2]; // "1.2"
        $title  = $matches[3]; // "Subtitle"

        if ($tag == 'h3') {
            $level = preg_match_all('/\d+/', $number) + 1;
            $tag = 'h' . $level;
            if (substr($number, -1, 1) != '.')
                $number .= '.';

            $line = "<$tag>$number$title</$tag>";
        }
    }
    else {
        $line = "<p>$line</p>";
    }
    echo $line, PHP_EOL;
}

<h2>1. Title</h2>
<p>Text</p>
<h3>1.1. Subtitle</h3>
<p>Text</p>
<h3>1.2. Subtitle</h3>
<p>Text</p>

$text = '<h3>1. Title</h3>
         Text 
         <h3>1.1 Subtitle</h3>
         Text
         <h3>1.2. Subtitle</h3>';
$lines = explode(PHP_EOL, $text);

$lines[0] = str_replace('h3','h2',$lines[0]); // Need to replace h3 to h2   only on First node
// replace a array of string
$search_str = array('.1 ', '.2 ');
$replace_str = array('.1. ', '.2. ');

foreach($lines as $line){
    if(!strchr($line,"<")){
       $line = '<p>'.$line.'</p>';
    }
$line = str_replace($search_str, $replace_str, $line);
print $line;
}