Javascript php：如何从给定页面的特定div获取所有超链接？_Javascript_Php

Javascript php：如何从给定页面的特定div获取所有超链接？

javascript php

Javascript php：如何从给定页面的特定div获取所有超链接？,javascript,php,Javascript,Php,我正试图从这个网站上获取一些div上的所有新闻链接URL 获取所有链接后，我查看源代码，但没有任何内容但是有任何数据显示任何懂PHP，Array（）和JS的人能帮我吗这是我获取内容的代码： $html = file_get_contents("https://qc.yahoo.com/"); if ($result === FALSE) { die("?"); } echo $html; 要查找HTML中的所有链接，可以使用preg_match_all（）那个urlhttps

我正试图从这个网站上获取一些

div

上的所有新闻链接URL

获取所有链接后，我查看源代码，但没有任何内容

但是有任何数据显示

任何懂

PHP

，

Array（）

和

JS

的人能帮我吗

这是我获取内容的代码：

$html = file_get_contents("https://qc.yahoo.com/");
if ($result === FALSE) {
    die("?");
} 
echo $html;

要查找HTML中的所有链接，可以使用preg_match_all（）

那个url

https://qc.yahoo.com/

使用gzip压缩，因此您必须检测该压缩，并使用函数gzdecode（）对其进行解压缩。（它必须安装在您的PHP版本中）

gzip压缩由

内容编码：gzip

HTTP头表示。您必须检查该标题，因此必须使用curl或类似的方法来检索标题。（file_get_contents（）不会给您HTTP头…它只下载gzip压缩的内容。您需要检测它是否被压缩，但为此您需要读取头。）

下面是一个完整的示例：

<?php

$url = "https://qc.yahoo.com/";

# download resource
$c = curl_init ($url);
curl_setopt ($c, CURLOPT_HEADER, true);
curl_setopt ($c, CURLOPT_RETURNTRANSFER, true);
$content = curl_exec ($c);
$hsize = curl_getinfo ($c, CURLINFO_HEADER_SIZE);
curl_close ($c);

# separate headers from content
$headers = substr ($content, 0, $hsize);
$content = substr ($content, $hsize);

# check if content is compressed with gzip
$gzip = 0;
$headers = preg_split ('/\r?\n/', $headers);
foreach ($headers as $h)
{
    $pieces = preg_split ("/:/", $h, 2);
    $pieces2 = (count ($pieces) > 1);
    $enc = $pieces2 && (preg_match ("/content-encoding/i", $pieces[0]) );
    $gz = $pieces2 && (preg_match ("/gzip/i", $pieces[1]) );
    if ($enc && $gz)
    {
        $gzip = 1;
        break;
    }
}

# unzip content if gzipped
if ($gzip)
{
    $content = gzdecode ($content);
}


# find links
$links = preg_match_all ("/href=\"([^\"]+)\"/i", $content, $matches);

# output results
echo "url = " . htmlspecialchars ($url) . "<br>";
echo "links found (" . count ($matches[1]) . "):" . "<br>";
$n = 0;
foreach ($matches[1] as $link)
{
    $n++;
    echo "$n: " . htmlspecialchars ($link) . "<br>";
}

假设您希望从给定页面中提取所有锚定
标记及其超链接
现在，在该URL上执行file\u get\u contents
时存在某些问题：
用于压缩的字符编码，即gzip
URL的SSL验证
因此，为了克服gzip
字符编码的第一个问题，我们将使用CURL，正如@gregn3在他的回答中所建议的那样。但他没有使用CURL自动解压缩gzip
ed内容的功能
对于第二个问题，您可以按照指南操作，也可以从CURL的CURL\u setopt
方法禁用SSL验证
现在，将从给定页面提取所有链接的代码是：
<?php

$url = "https://qc.yahoo.com/";

# download resource
$c = curl_init ($url);
curl_setopt($c, CURLOPT_HTTPHEADER, ["Accept-Encoding:gzip"]);
curl_setopt ($c, CURLOPT_RETURNTRANSFER, true);
curl_setopt($c, CURLOPT_FOLLOWLOCATION, 1); 
curl_setopt($c, CURLOPT_ENCODING , "gzip");
curl_setopt($c, CURLOPT_VERBOSE, 1);
curl_setopt($c, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($c, CURLOPT_SSL_VERIFYHOST, 0);
$content = curl_exec ($c);

curl_close ($c);

$links = preg_match_all ("/href=\"([^\"]+)\"/i", $content, $matches);

# output results
echo "url = " . htmlspecialchars ($url) . "<br>";
echo "links found (" . count ($matches[1]) . "):" . "<br>";
$n = 0;
foreach ($matches[1] as $link)
{
    $n++;
    echo "$n: " . htmlspecialchars ($link) . "<br>";
}


可以从指定的div获取所有链接。确保将div id放入id='news\u moretoptories']
。您正在使用xpath
查询div。您不需要大量的代码，只需要这一部分
我很难理解。如果您向我们展示一个示例$html
输入，以及您在完成处理时想要的内容，这会有所帮助。只是一个小样本，足以让我们理解您的意图。hy@belletJuice让您检查一下我的意思，对不起，我在关键字的编码和名称方面没有进步。请您的帮助^^hy@gregn3谢谢您理解我的帖子我不知道什么关键字，在我使用您的代码后我得到了eroor，这里我检查了我的php 5.6.23，gzdecode OK，zlib扩展加载，但是php致命错误：调用未定义的函数gzip_inflate（）generate。。为什么？请你帮忙。顺便说一句，对不起，我想放弃投票，但谢谢你的反馈！声誉低于15的人的投票会被记录下来，但不要改变公开显示的帖子分数#myrputation是坏的T。Texample如果我打开原始站点，则有10个链接。但是当我卷曲站点时，它们只显示5个链接。。如何显示所有链接？@ane嗨，要获取页面上的所有链接，您可以尝试调整使用的正则表达式。可能这与所有选项都不匹配：“/href=\”（[^\“]+）\“/i”
然后添加curl选项curl\u setopt（$c，CURLOPT\u ENCODING，“gzip”）将完成任务。之后，curl本身将解压响应。谢谢@Deepak，我对curl不是很熟悉，但现在我也知道了。：）不，我喜欢这个。这让我更理解。谢谢你的描述和知识先生：*亲吻拥抱..太棒了顺便问一句，你有什么想法，我想加上你先生：）对不起，我不知道什么socmed是。@DeepakChaudhary社交媒体先生：3ah..:D我在socmed上不太活跃。hy先生，谢谢你也帮助我们，这将为我添加更多解决方案^^是的，这是一个更好的解决方案，但它似乎无法解码gzip ed内容。
<?php

$url = "https://qc.yahoo.com/";

# download resource
$c = curl_init ($url);
curl_setopt($c, CURLOPT_HTTPHEADER, ["Accept-Encoding:gzip"]);
curl_setopt ($c, CURLOPT_RETURNTRANSFER, true);
curl_setopt($c, CURLOPT_FOLLOWLOCATION, 1); 
curl_setopt($c, CURLOPT_ENCODING , "gzip");
curl_setopt($c, CURLOPT_VERBOSE, 1);
curl_setopt($c, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($c, CURLOPT_SSL_VERIFYHOST, 0);
$content = curl_exec ($c);

curl_close ($c);

$links = preg_match_all ("/href=\"([^\"]+)\"/i", $content, $matches);

# output results
echo "url = " . htmlspecialchars ($url) . "<br>";
echo "links found (" . count ($matches[1]) . "):" . "<br>";
$n = 0;
foreach ($matches[1] as $link)
{
    $n++;
    echo "$n: " . htmlspecialchars ($link) . "<br>";
}

$html = new DOMDocument();
@$html->loadHtmlFile('https://qc.yahoo.com/');
$xpath = new DOMXPath( $html );
$nodelist = $xpath->query( "//div[@id='news_moreTopStories']//a/@href" );
foreach ($nodelist as $n){
echo $n->nodeValue."\n";
}