Php 如何在特定的HTML Dom之后获取字符串
以下是html:Php 如何在特定的HTML Dom之后获取字符串,php,dom,web-crawler,simple-html-dom,Php,Dom,Web Crawler,Simple Html Dom,以下是html: <td width="551"> <p><strong>Full Time Faculty<br> <strong></strong>Assistant Professor</strong></p>Doctorate of Business Administration<br><br> <strong>Phone</strong>: +
<td width="551">
<p><strong>Full Time Faculty<br>
<strong></strong>Assistant Professor</strong></p>Doctorate of Business Administration<br><br>
<strong>Phone</strong>: +88 01756567676<br>
<strong>Email</strong>: frank.wade@email.com<br>
<strong>Office</strong>: NAC739<br>
<br><p><b>Curriculum Vitae</b></p></td>
有没有办法通过与标记匹配来获得这3行代码?也许您可以使用xpath函数,如
$xml = new SimpleXMLElement($DomAsString);
$theText = $xml->xpath('//strong[. ="Phone"]/following-sibling::text()');
删除“:”的一些剪报,当然也修复了dom结构您真的不需要将其解析为HTML或处理dom树。您可以将HTML字符串分解为多个片段,然后删除每个片段中多余的内容以获得所需内容:
<?php
$str = <<<str
<td width="551">
<p><strong>Full Time Faculty<br>
<strong></strong>Assistant Professor</strong></p>Doctorate of Business Administration<br><br>
<strong>Phone</strong>: +88 01756567676<br>
<strong>Email</strong>: frank.wade@email.com<br>
<strong>Office</strong>: NAC739<br>
<br><p><b>Curriculum Vitae</b></p></td>
str;
// We explode $str and use '</strong>' as delimiter and get only the part of result that we need
$lines = array_slice(explode('</strong>', $str), 3, 3);
// Define a function to remove extra text from left and right of our so called lines
function stripLine($line) {
// ltrim ' ;' characters and remove everything after (and including) '<br>'
return preg_replace('/<br>.*/is', '', ltrim($line, ' :'));
}
$lines = array_map('stripLine', $lines);
print_r($lines);
或直接使用正则表达式:
preg_match('|Phone</strong>: [^<]+|', $str, $m) or die('no phone');
$phone = $m[1];
preg_match(“| Phone:[^您可能想改用DOMDocument
。
preg_match('|Phone</strong>: [^<]+|', $str, $m) or die('no phone');
$phone = $m[1];