Php 在HTML文档中搜索facebook页面URL和twitter URL_Php_Html_Curl

Php 在HTML文档中搜索facebook页面URL和twitter URL

php html curl

Php 在HTML文档中搜索facebook页面URL和twitter URL,php,html,curl,Php,Html,Curl,我正在从一些网站抓取数据，因此在响应HTML代码中，我想获取facebook页面链接和twitter帐户链接（如果有）。获取的一个html代码示例如下所示：注意：我使用CURL模块来获取数据 <a href="https://www.facebook.com/Example-page-16149277784545354/" target="_blank"> <div class="template asset" data-id="4722053" contenteditab

我正在从一些网站抓取数据，因此在响应HTML代码中，我想获取facebook页面链接和twitter帐户链接（如果有）。获取的一个html代码示例如下所示：注意：我使用CURL模块来获取数据

<a href="https://www.facebook.com/Example-page-16149277784545354/" target="_blank">
<div class="template asset" data-id="4722053" contenteditable="false">
<figure>
........
</figure>
</div>
</a>

我需要“href”属性内的facebook页面链接，twitter帐户链接也是如此。

我还没有测试过这段代码。但这是一个粗略的解决方法，这个循环可能会无限大。请进行测试，如有错误请予以纠正

<?php
$str = file_get_contents($url);
$i = -1;
while(strpos("href='",$i)>=0){
   $strpos = strpos("href='",$i);
   $i2 = strpos("'",$i+7);
   $link = substr($str,$strpos,$i2);
   $i = $i2 + 1;
  //now check if the link is facebook, twitter etc.
}
//do the same with while(strpos("href=\"",$i)>=0){

您可以使用regex进行检查，下面是一个facebook检查的示例：
$testString = '<a href="https://www.facebook.com/Example-page-16149277784545354/" target="_blank">
<div class="template asset" data-id="4722053" contenteditable="false">
<figure>
........
</figure>
</div>
</a>';

$facebookPattern = '/"(http[s]{0,1}:\/\/www\.facebook\.com[^"]+)"/';
preg_match_all($facebookPattern, $testString, $matches);

print_r($matches[1]);

$testString=''；
$facebookPattern='/“（http[s]{0,1}:\/\/www\.facebook\.com[^“]+）”/”；
preg_match_all（$facebookPattern、$testString、$matches）；
打印（$matches[1]）；

另请参见您可以使用简单的html dom，它提供了一个面向对象的接口。
您只需向函数提供url，该函数将html提取并解析为一个对象。您可以调用该对象上的属性和方法来访问dom元素
供参考：
使用python编写的像beautifulsoup
这样的刮片工具会容易得多