用PHP提取HTML代码字符串_Php_Regex

用PHP提取HTML代码字符串

php regex

用PHP提取HTML代码字符串,php,regex,Php,Regex,此表达式仅获取尖括号>]+>([^ class=“producto”[^>]+> ( [^ 您可以使用这样的正则表达式： ([\w\s-]+)</ （[\w\s-\（\）]+）您要求在此处使用纯正则表达式，但它不适合解析HTML function _matcher ($m, $str) { if (preg_match('/^\d+/', $str, $matches)) $m[] = $matches[0]; return $m; } $dom = new D

此表达式仅获取尖括号><之间的数值（当它们是数字时）。无论如何我都想得到它们

function GetProducts($file){
    $regex = "|class=\"producto\"[^>]+>([0-9]*)</[^>]+>|U";
    if(!is_file($file)) return false;
    preg_match_all($regex,file_get_contents($file), $result);
    foreach($result[1] as $key =>$value) $result[$key] = (int) $value;
    return $result;
}

这可能有效，但正如人们所说，用正则表达式解析html是有问题的

 # class="producto"[^>]+>([^<]*)</[^>]+>

 class="producto" [^>]+ >
 ( [^<]* )
 </ [^>]+ >

#class=“producto”[^>]+>([^
class=“producto”[^>]+>
( [^

您可以使用这样的正则表达式：

([\w\s-\(\)]+)</

（[\w\s-\（\）]+）您要求在此处使用纯正则表达式，但它不适合解析HTML
function _matcher ($m, $str) {
  if (preg_match('/^\d+/', $str, $matches))
    $m[] = $matches[0];
  return $m;
}

$dom = new DOMDocument;
$dom->loadHTML($html); 
$xpath = new DOMXPath($dom);

foreach ($xpath->query('//a[@class="producto"]') as $link) {
   $vals[] = $link->nodeValue;
}

print_r(array_reduce($vals, '_matcher', array()));

输出（）
您想要的输出是什么？数组（[1]=>1027[2]=>5611[3]=>5396[4]=>834006[5]=>5601[6]=>2182[7]=>1458）引用一篇谴责HTML正则表达式解析的文章的丰富答案，虽然要求正则表达式解析任意HTML就像要求Paris Hilton编写操作系统一样，但有时解析有限的已知HTML集是合适的。这里就是这样。是的，我可以扔下一个15k正则表达式来解析HTML它仍然有问题。特别是实体和替换。我认为这甚至适用于一组已知的html。
([\w\s-\(\)]+)</

function _matcher ($m, $str) {
  if (preg_match('/^\d+/', $str, $matches))
    $m[] = $matches[0];
  return $m;
}

$dom = new DOMDocument;
$dom->loadHTML($html); 
$xpath = new DOMXPath($dom);

foreach ($xpath->query('//a[@class="producto"]') as $link) {
   $vals[] = $link->nodeValue;
}

print_r(array_reduce($vals, '_matcher', array()));

Array
(
    [0] => 1027
    [1] => 5611
    [2] => 396
    [3] => 834006
    [4] => 5601
    [5] => 2182
    [6] => 1458
)