Php 如何在特定的HTML Dom之后获取字符串

Php 如何在特定的HTML Dom之后获取字符串,php,dom,web-crawler,simple-html-dom,Php,Dom,Web Crawler,Simple Html Dom,以下是html: <td width="551"> <p><strong>Full Time Faculty<br> <strong></strong>Assistant Professor</strong></p>Doctorate of Business Administration<br><br> <strong>Phone</strong>: +

以下是html:

<td width="551">
<p><strong>Full Time Faculty<br>
<strong></strong>Assistant Professor</strong></p>Doctorate of Business Administration<br><br>
<strong>Phone</strong>: +88 01756567676<br>
<strong>Email</strong>: frank.wade@email.com<br> 
<strong>Office</strong>: NAC739<br>
<br><p><b>Curriculum Vitae</b></p></td>

有没有办法通过与标记匹配来获得这3行代码?

也许您可以使用xpath函数,如

$xml = new SimpleXMLElement($DomAsString);
$theText = $xml->xpath('//strong[. ="Phone"]/following-sibling::text()');

删除“:”的一些剪报,当然也修复了dom结构

您真的不需要将其解析为HTML或处理dom树。您可以将HTML字符串分解为多个片段,然后删除每个片段中多余的内容以获得所需内容:

<?php 

$str = <<<str
<td width="551">
<p><strong>Full Time Faculty<br>
<strong></strong>Assistant Professor</strong></p>Doctorate of Business Administration<br><br>
<strong>Phone</strong>: +88 01756567676<br>
<strong>Email</strong>: frank.wade@email.com<br>
<strong>Office</strong>: NAC739<br>
<br><p><b>Curriculum Vitae</b></p></td>
str;

// We explode $str and use '</strong>' as delimiter and get only the part of result that we need
$lines = array_slice(explode('</strong>', $str), 3, 3);
// Define a function to remove extra text from left and right of our so called lines
function stripLine($line) {
    // ltrim ' ;' characters and remove everything after (and including) '<br>'
    return preg_replace('/<br>.*/is', '', ltrim($line, ' :'));
}
$lines = array_map('stripLine', $lines);

print_r($lines);

或直接使用正则表达式:

preg_match('|Phone</strong>: [^<]+|', $str, $m) or die('no phone');
$phone = $m[1];

preg_match(“| Phone:[^您可能想改用
DOMDocument
preg_match('|Phone</strong>: [^<]+|', $str, $m) or die('no phone');
$phone = $m[1];