从PHP数组中的链接提取内容_Php_Html_Arrays_Regex_Extract

从PHP数组中的链接提取内容

php html arrays regex

从PHP数组中的链接提取内容,php,html,arrays,regex,extract,Php,Html,Arrays,Regex,Extract,如何从未知链接列表中提取内容？好的，我有这个： <div class="unknown_class"> <a title="The title x" href="link1.html">This is the content I need 1</a><br> <a title="The title y" href="another-link.html">This is the content I need 2</a

如何从未知链接列表中提取内容？好的，我有这个：

<div class="unknown_class">
    <a title="The title x" href="link1.html">This is the content I need 1</a><br>
    <a title="The title y" href="another-link.html">This is the content I need 2</a><br>
    <a title="The title z" href="something-else.html">This is the content I need 3</a><br>
</div>

<a title="The title 0" href="something.html">I dont need this</a>

非常感谢您的帮助。

您可以使用preg\u match\u all（）

$html='1！'
';
preg_match_all（'`]+>（[^最好的方法是使用DOM解析器。请参阅链接的副本。类似的方法可以工作：$links=$DOM->getElementsByTagName（'a'）；foreach（$links as$link）{$arr[]=$link->nodeValue；}
@AmalMurali，不，HTML来自无效的HTML文档，DOM解析器不起作用。我还更新了我的答案。有些链接我不需要。我需要的链接后面有
标记。@Smartik你发布的HTML看起来相当不错；也许你应该发布一个指向完整代码的链接？@Jack完整的HTML是c完整页面。它包括doctype、head、body标记和其他内联CSS/JS。DOM解析器不会验证它。Andy Truong的答案正是我需要的。谢谢您的关注。@Smartik它不必验证它；libxml将尝试理解文档…但如果您不想发布它，那当然取决于您。谢谢。非常有效我想要。接受。
Array(
    'This is the content I need 1',
    'This is the content I need 2',
    'This is the content I need 3'
)

$html = '<div class="unknown_class">
    <a title="The title" href="link1.html">This is the content I need 1</a>
    <a title="The title" href="another-link.html">This is the content I need 2</a>
    <a title="The title" href="something-else.html">This is the content I need 3</a>
</div>';

preg_match_all('`<a[^>]+>([^<]+)</a>`', $html, $matches);
print_r($matches[1]);