Php 解析HTML标记中的信息以避免标题值_Php_Regex

Php 解析HTML标记中的信息以避免标题值

php regex

Php 解析HTML标记中的信息以避免标题值,php,regex,Php,Regex,我正在尝试解析HTML标记之间的信息。使用正则表达式，我如何避免标题值，因为它们是不同的，并且只解析标记中的信息 HTML代码： <p class=period> <abbr class=dtstart title=2010>2010</abbr> <abbr class=dtend title=2012>2012</abbr> </p> 输出应该是：2010年，2012年如果title=2010，我正在使用此方法及其

我正在尝试解析HTML标记之间的信息。使用正则表达式，我如何避免标题值，因为它们是不同的，并且只解析标记中的信息

HTML代码：

<p class=period>
<abbr class=dtstart title=2010>2010</abbr>
<abbr class=dtend title=2012>2012</abbr> 
</p>

输出应该是：2010年，2012年

如果title=2010，我正在使用此方法及其工作原理：

$experience .= "<c:start_date>". trim($this->parse_text($tmp3[$i], "<abbr class=\"dtstart\" title=\"2010\">", "</abbr>"))."</c:start_date>\r\n";

我尝试过这个：title=\.\但它不起作用！有没有关于我应该使用哪个正则表达式的建议

非常感谢

正则表达式不是为HTML解析而设计的。您最好使用DOM/XPath：

$html = <<<HTML
<p class=period>
<abbr class=dtstart title=2010>2010</abbr>
<abbr class=dtend title=2012>2012</abbr> 
</p>
HTML;
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$dtstart = $xpath->query("//abbr[contains(@class, 'dtstart')]")->item(0)->nodeValue;
$dtend = $xpath->query("//abbr[contains(@class, 'dtend')]")->item(0)->nodeValue;

parse_text是如何定义的？正则表达式是title=\d+parse_text函数：函数parse_text$str，$start，$end{ifempty$str{return；}$pos_start=strps$str，$start；$pos_end=strps$str，$end，$pos_start+strlen$start；if$pos_start！==false&$pos_end！==false{$pos1=$pos_start+strlen$start；$pos2=$pos_end-$pos1；返回substr$str，$pos1，$pos2；}否则返回；}值得注意的是，parse_文本函数中没有使用正则表达式。。。

$dates = $xpath->query("//abbr[contains(@class, 'dtstart') or contains(@class, 'dtend')]");
list($dtstart, $dtend) = array_map(function ($node) {
    return $node->nodeValue;
}, iterator_to_array($dates));