Php 使用正则表达式（或任何其他方式）匹配基本HTML_Php_Html_Regex_Preg Match_Preg Match All

Php 使用正则表达式（或任何其他方式）匹配基本HTML

php html regex

Php 使用正则表达式（或任何其他方式）匹配基本HTML,php,html,regex,preg-match,preg-match-all,Php,Html,Regex,Preg Match,Preg Match All,我有一些HTML，如下所示： This is a title: 0091 + Two + 423 + Four + (Five, Six, Seven) Some more text: Abc + Hi + Random + Text + (Hello, 522, Four) ... Hello world!: Test + Foo + 1122

我有一些HTML，如下所示：

    <b>This is a title: </b> 0091 + Two + 423 + Four + (Five, Six, Seven)
    <b>Some more text: </b> Abc + Hi + Random + Text + (Hello, 522, Four)
    ...
    <b>Hello world!: </b> Test + Foo + 1122 + (120, 122, Four)

<pre>
<?php

$subject =<<<LOD
<b>This is a title: </b> 0091 + Two + 423 + Four + (Five, Six, Seven)
<b>Some more text: </b> Abc + Hi + Random + Text + (Hello, 522, Four)
<b>Hello world!: </b> Test + Foo + 1122 + (120, 122, Four)
LOD;

$pattern = '~(<b>.*?</b>)((?>[^<]+|<(?!b))*)~';
preg_match_all($pattern, $subject, $matches);

array_shift($matches);
array_walk_recursive($matches,function (&$val){ $val=trim($val); });
list($array1, $array2) = $matches;

print_r($array1);
print_r($array2);

数组2-（这将包含

标记之外的所有内容）

我尝试使用正则表达式和

preg\u match\u all

，但我似乎无法理解它们。我们将非常感谢您的帮助

谢谢


<?php 
$string = '    <b>This is a title: </b> 0091 + Two + 423 + Four + (Five, Six, Seven)
    <b>Some more text: </b> Abc + Hi + Random + Text + (Hello, 522, Four)
    ...
    <b>Hello world!: </b> Test + Foo + 1122 + (120, 122, Four)';
preg_match_all("#(<b>[^<]+<\/b>)([^<]+)#", $string, $matches);
print_r($matches);
?>

输出：

Array
(
    [0] => Array
        (
            [0] => <b>This is a title: </b> 0091 + Two + 423 + Four + (Five, Six, Seven)

            [1] => <b>Some more text: </b> Abc + Hi + Random + Text + (Hello, 522, Four)
    ...

            [2] => <b>Hello world!: </b> Test + Foo + 1122 + (120, 122, Four)
        )

    [1] => Array
        (
            [0] => <b>This is a title: </b>
            [1] => <b>Some more text: </b>
            [2] => <b>Hello world!: </b>
        )

    [2] => Array
        (
            [0] =>  0091 + Two + 423 + Four + (Five, Six, Seven)

            [1] =>  Abc + Hi + Random + Text + (Hello, 522, Four)
    ...

            [2] =>  Test + Foo + 1122 + (120, 122, Four)
        )

)

数组
(
[0]=>阵列
(
[0]=>这是一个标题：0091+2+423+4+（五、六、七）
[1] =>更多文本：Abc+Hi+随机+文本+（你好，522，四）
...
[2] =>你好，世界！：测试+Foo+1122+（120122，四）
)
[1] =>阵列
(
[0]=>这是一个标题：
[1] =>更多文本：
[2] =>你好，世界
)
[2] =>阵列
(
[0]=>0091+2+423+4+（五、六、七）
[1] =>Abc+Hi+随机+文本+（你好，522，四）
...
[2] =>Test+Foo+1122+（120122,4）
)
)

您可以尝试以下方法：


不要使用正则表达式来解析HTML。您无法用正则表达式可靠地解析HTML，在这条路上您将面临悲伤和挫折。一旦HTML改变了您的预期，您的代码就会被破坏。有关如何使用已经编写、测试和调试过的PHP模块正确解析HTML的示例，请参见！我已经在使用simple\u html\u dom库（在你发布的链接中提到了）。我在处理这个字符串时特别困难，我决定使用正则表达式。就为了这个案子。否则，剩下的部分我将使用HTML解析器库。感谢您的输入：）阿克兰，这工作完美！非常感谢。：）有什么好的建议可以让我学到更多关于regex的知识吗？你自己试试吧，regex很好学。。。搜索谷歌
Array
(
    [0] => Array
        (
            [0] => <b>This is a title: </b> 0091 + Two + 423 + Four + (Five, Six, Seven)

            [1] => <b>Some more text: </b> Abc + Hi + Random + Text + (Hello, 522, Four)
    ...

            [2] => <b>Hello world!: </b> Test + Foo + 1122 + (120, 122, Four)
        )

    [1] => Array
        (
            [0] => <b>This is a title: </b>
            [1] => <b>Some more text: </b>
            [2] => <b>Hello world!: </b>
        )

    [2] => Array
        (
            [0] =>  0091 + Two + 423 + Four + (Five, Six, Seven)

            [1] =>  Abc + Hi + Random + Text + (Hello, 522, Four)
    ...

            [2] =>  Test + Foo + 1122 + (120, 122, Four)
        )

)

<pre>
<?php

$subject =<<<LOD
<b>This is a title: </b> 0091 + Two + 423 + Four + (Five, Six, Seven)
<b>Some more text: </b> Abc + Hi + Random + Text + (Hello, 522, Four)
<b>Hello world!: </b> Test + Foo + 1122 + (120, 122, Four)
LOD;

$pattern = '~(<b>.*?</b>)((?>[^<]+|<(?!b))*)~';
preg_match_all($pattern, $subject, $matches);

array_shift($matches);
array_walk_recursive($matches,function (&$val){ $val=trim($val); });
list($array1, $array2) = $matches;

print_r($array1);
print_r($array2);