Php 如何使用简单的HTMLDOM获取具有特定内容的链接_Php_Web Scraping_Simple Html Dom

Php 如何使用简单的HTMLDOM获取具有特定内容的链接

php web-scraping

Php 如何使用简单的HTMLDOM获取具有特定内容的链接,php,web-scraping,simple-html-dom,Php,Web Scraping,Simple Html Dom,我想从获取“/contact/new”。条件是，如果链接有“联系人”或“联系我们”文本，则获取href值。没有课我该怎么做使用regex和PHP： $text = '<a href="/contact/new">Contact us</a>'; preg_match_all('(<a href="([^"]*)">[Contact us|Contact]*</a>)', $text, $matches); foreach ($matches[

我想从

获取“/contact/new”。条件是，如果链接有“联系人”或“联系我们”文本，则获取href值。没有课

我该怎么做

使用

regex

和

PHP

：

$text = '<a href="/contact/new">Contact us</a>';

preg_match_all('(<a href="([^"]*)">[Contact us|Contact]*</a>)', $text, $matches);
foreach ($matches[1] as $href) {
    // Do whatever you want with the href attribute
    echo $href;
}

我已经用这段代码解决了。显然是在得到@Matias Cerrotta的帮助后

foreach（$dom->find（'a'）as$element）{
echo$element->纯文本。“
”；
}

这可以使用SimpleXML和XPath实现

您需要调整如何使用

file\u get\u contents

或其他方法将页面加载到SimpleXML中，以将页面读取到一个变量，然后将其传递

我已经创建了一个在下面工作的模型

<?php
$html = '
<a href="/contact/new">Contact us</a>
';

//Replace with your loading logic here
$xml = simplexml_load_string($html);

//Perform the search
$search = $xml->xpath('//a[contains(text(), "Contact us") or contains(text(), "Contact")]');

//Check the results have at least one value
if(count($search) !== 0 && $search !== false)
{
    //Get first item
    $item = $search[0];

    //Get item attributes
    $attributes = $item->attributes();

    //Output the HREF attribute (need an existence check here (isset))
    echo $attributes['href'];
}

xpath（'//a[contains（text（），“Contact-us”）或contains（text（），“Contact”）]；
//检查结果是否至少有一个值
如果（计数（$search）！==0&&$search！==false）
{
//获取第一项
$item=$search[0]；
//获取项目属性
$attributes=$item->attributes（）；
//输出HREF属性（此处需要存在性检查（isset））
echo$attributes['href']；
}

XPath方法返回一个匹配数组，如果返回多个结果，则需要对其进行筛选，在示例中，我将获取第一个匹配并输出节点的href属性

搜索将查找所有

标记，而不考虑其在字符串/文档中的位置，并检查其是否包含“联系我们”或“联系”

注意：XPath区分大小写，虽然有一些方法可以使其不区分大小写，但您需要自己实现，或者编写更多条件来检查

如果您需要不区分大小写，请检查另一个堆栈问题，之前已经讨论过：

例如：

你的意思是用PHP像字符串一样解析

标记？是的，解析所有

标记。jQuery有

：包含，所以你可以直接使用它。您的正则表达式也有一些主要问题。@pguardiario我知道必须选择带有文本的a
标记Contact
和Contact us
。这就是我使用html（）的原因。我一直在测试正则表达式，但真的不知道你在谈论什么主要问题。我建议你发布一个“这个正则表达式有什么问题”的问题。这样你会得到很多好的反馈。在这篇评论中我无法给出更多。
<?php
$html = '
<a href="/contact/new">Contact us</a>
';

//Replace with your loading logic here
$xml = simplexml_load_string($html);

//Perform the search
$search = $xml->xpath('//a[contains(text(), "Contact us") or contains(text(), "Contact")]');

//Check the results have at least one value
if(count($search) !== 0 && $search !== false)
{
    //Get first item
    $item = $search[0];

    //Get item attributes
    $attributes = $item->attributes();

    //Output the HREF attribute (need an existence check here (isset))
    echo $attributes['href'];
}