Php 使用正则表达式(或任何其他方式)匹配基本HTML

Php 使用正则表达式(或任何其他方式)匹配基本HTML,php,html,regex,preg-match,preg-match-all,Php,Html,Regex,Preg Match,Preg Match All,我有一些HTML,如下所示: <b>This is a title: </b> 0091 + Two + 423 + Four + (Five, Six, Seven) <b>Some more text: </b> Abc + Hi + Random + Text + (Hello, 522, Four) ... <b>Hello world!: </b> Test + Foo + 1122

我有一些HTML,如下所示:

    <b>This is a title: </b> 0091 + Two + 423 + Four + (Five, Six, Seven)
    <b>Some more text: </b> Abc + Hi + Random + Text + (Hello, 522, Four)
    ...
    <b>Hello world!: </b> Test + Foo + 1122 + (120, 122, Four)
<pre>
<?php

$subject =<<<LOD
<b>This is a title: </b> 0091 + Two + 423 + Four + (Five, Six, Seven)
<b>Some more text: </b> Abc + Hi + Random + Text + (Hello, 522, Four)
<b>Hello world!: </b> Test + Foo + 1122 + (120, 122, Four)
LOD;

$pattern = '~(<b>.*?</b>)((?>[^<]+|<(?!b))*)~';
preg_match_all($pattern, $subject, $matches);

array_shift($matches);
array_walk_recursive($matches,function (&$val){ $val=trim($val); });
list($array1, $array2) = $matches;

print_r($array1);
print_r($array2);
数组2-(这将包含
标记之外的所有内容)

我尝试使用正则表达式和
preg\u match\u all
,但我似乎无法理解它们。我们将非常感谢您的帮助

谢谢


<?php 
$string = '    <b>This is a title: </b> 0091 + Two + 423 + Four + (Five, Six, Seven)
    <b>Some more text: </b> Abc + Hi + Random + Text + (Hello, 522, Four)
    ...
    <b>Hello world!: </b> Test + Foo + 1122 + (120, 122, Four)';
preg_match_all("#(<b>[^<]+<\/b>)([^<]+)#", $string, $matches);
print_r($matches);
?> 
输出:

Array
(
    [0] => Array
        (
            [0] => <b>This is a title: </b> 0091 + Two + 423 + Four + (Five, Six, Seven)

            [1] => <b>Some more text: </b> Abc + Hi + Random + Text + (Hello, 522, Four)
    ...

            [2] => <b>Hello world!: </b> Test + Foo + 1122 + (120, 122, Four)
        )

    [1] => Array
        (
            [0] => <b>This is a title: </b>
            [1] => <b>Some more text: </b>
            [2] => <b>Hello world!: </b>
        )

    [2] => Array
        (
            [0] =>  0091 + Two + 423 + Four + (Five, Six, Seven)

            [1] =>  Abc + Hi + Random + Text + (Hello, 522, Four)
    ...

            [2] =>  Test + Foo + 1122 + (120, 122, Four)
        )

)
数组
(
[0]=>阵列
(
[0]=>这是一个标题:0091+2+423+4+(五、六、七)
[1] =>更多文本:Abc+Hi+随机+文本+(你好,522,四)
...
[2] =>你好,世界!:测试+Foo+1122+(120122,四)
)
[1] =>阵列
(
[0]=>这是一个标题:
[1] =>更多文本:
[2] =>你好,世界
)
[2] =>阵列
(
[0]=>0091+2+423+4+(五、六、七)
[1] =>Abc+Hi+随机+文本+(你好,522,四)
...
[2] =>Test+Foo+1122+(120122,4)
)
)
您可以尝试以下方法:



不要使用正则表达式来解析HTML。您无法用正则表达式可靠地解析HTML,在这条路上您将面临悲伤和挫折。一旦HTML改变了您的预期,您的代码就会被破坏。有关如何使用已经编写、测试和调试过的PHP模块正确解析HTML的示例,请参见!我已经在使用
simple\u html\u dom
库(在你发布的链接中提到了)。我在处理这个字符串时特别困难,我决定使用正则表达式。就为了这个案子。否则,剩下的部分我将使用HTML解析器库。感谢您的输入:)阿克兰,这工作完美!非常感谢。:)有什么好的建议可以让我学到更多关于regex的知识吗?你自己试试吧,regex很好学。。。搜索谷歌
Array
(
    [0] => Array
        (
            [0] => <b>This is a title: </b> 0091 + Two + 423 + Four + (Five, Six, Seven)

            [1] => <b>Some more text: </b> Abc + Hi + Random + Text + (Hello, 522, Four)
    ...

            [2] => <b>Hello world!: </b> Test + Foo + 1122 + (120, 122, Four)
        )

    [1] => Array
        (
            [0] => <b>This is a title: </b>
            [1] => <b>Some more text: </b>
            [2] => <b>Hello world!: </b>
        )

    [2] => Array
        (
            [0] =>  0091 + Two + 423 + Four + (Five, Six, Seven)

            [1] =>  Abc + Hi + Random + Text + (Hello, 522, Four)
    ...

            [2] =>  Test + Foo + 1122 + (120, 122, Four)
        )

)
<pre>
<?php

$subject =<<<LOD
<b>This is a title: </b> 0091 + Two + 423 + Four + (Five, Six, Seven)
<b>Some more text: </b> Abc + Hi + Random + Text + (Hello, 522, Four)
<b>Hello world!: </b> Test + Foo + 1122 + (120, 122, Four)
LOD;

$pattern = '~(<b>.*?</b>)((?>[^<]+|<(?!b))*)~';
preg_match_all($pattern, $subject, $matches);

array_shift($matches);
array_walk_recursive($matches,function (&$val){ $val=trim($val); });
list($array1, $array2) = $matches;

print_r($array1);
print_r($array2);