Php 使用preg_match_all（简单）获取url_Php_Html_Regex_Preg Match All

Php 使用preg_match_all（简单）获取url

php html regex

Php 使用preg_match_all（简单）获取url,php,html,regex,preg-match-all,Php,Html,Regex,Preg Match All,我真的不知道为什么我不能从一个带有preg_match的网站的源代码中获取一些url，也许是我做错了，我尝试了很多方法，但我无法获取问题是，我试图仅从如下所示的源代码中获取url： <h2><a href="http://www.website.com/index.php" h="ID=SERP,5085.1">Website name</a></h2> 所以我想得到的变量是我是这样做的： preg_match_all('/<h2&

我真的不知道为什么我不能从一个带有preg_match的网站的源代码中获取一些url，也许是我做错了，我尝试了很多方法，但我无法获取

问题是，我试图仅从如下所示的源代码中获取url：

<h2><a href="http://www.website.com/index.php" h="ID=SERP,5085.1">Website name</a></h2>

所以我想得到的变量是

我是这样做的：

preg_match_all('/<h2><a href=".*">/',$text,$m) ;

preg_match_all（'/'，$text，$m）；

$text是源代码，它非常长的网站源代码，所以我只想从标签

中的标签获取href。我希望你们能帮我试试这个：

<?php
$string = '<h2><a href="http://www.website.com/index.php" h="ID=SERP,5085.1">Website name</a></h2>';
$url = preg_replace('#.*href="([^\"]+)".*#', '\1', $string);
print_r($url);
?>

您在这里要求使用正则表达式，但它不是解析HTML的正确工具。用于此：

$html = <<<DATA
<h2><a href="http://www.website.com/index.php" h="ID=SERP,5085.1">Website name</a></h2>
<h2><a href="http://www.example.com">Example site</a></h2>
<h1><a href="http://www.bar.com">Bar</a></h1>
<a href="http://www.foo.com">foo</a>
DATA;

$dom = new DOMDocument;
$dom->loadHTML($html); // Load your HTML data..

$xpath = new DOMXPath($dom);

foreach ($xpath->query("//h2/a") as $tag) {
   $links[] = $tag->getAttribute('href');
}

print_r($links);

您已经拥有的代码有什么好处？不理解问题：/n不确定该注释的意思。您的

preg\u match\u all

当前生成了什么输出？换句话说，它做什么是错误的？事实上，Reg ex不是正确的工具+1:）我得到的源代码如下：$text=file_get_contents（“）；因此，我没有做很好的示例，而是不知道

DOMXPath

@hwnd已经尝试过了，它给出了“警告：DOMDocument:：loadHTML（）：标记头在实体中无效，第87行的第9行/home/nyox/public_html/index.php“警告：DOMDocument:：loadHTML（）：标记nav在实体中无效，第87行的第9行/home/nyox/public_html/index.php警告：DOMDocument:：loadHTML（）：标记nav在实体中无效，第87行的第16行/home/nyox/public_html/index.php警告：DOMDocument:：loadHTML（）：实体中的标记页脚无效，第行/home/nyox/public\u html/index.php中的第16行87@hwnd我记得5年前我尝试过

DOMDocument

做一个项目，我记得速度非常慢仅加载orkut页面就花费了大约1秒的时间。必须选择退出其他解决方案。您知道他们是否解决了此性能问题吗？

$html = <<<DATA
<h2><a href="http://www.website.com/index.php" h="ID=SERP,5085.1">Website name</a></h2>
<h2><a href="http://www.example.com">Example site</a></h2>
<h1><a href="http://www.bar.com">Bar</a></h1>
<a href="http://www.foo.com">foo</a>
DATA;

$dom = new DOMDocument;
$dom->loadHTML($html); // Load your HTML data..

$xpath = new DOMXPath($dom);

foreach ($xpath->query("//h2/a") as $tag) {
   $links[] = $tag->getAttribute('href');
}

print_r($links);

Array
(
    [0] => http://www.website.com/index.php
    [1] => http://www.example.com
)