Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/php/258.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
PHP DOM解析URL未返回任何内容_Php_Html_Parsing_Dom_Html Parsing - Fatal编程技术网

PHP DOM解析URL未返回任何内容

PHP DOM解析URL未返回任何内容,php,html,parsing,dom,html-parsing,Php,Html,Parsing,Dom,Html Parsing,我使用以下示例代码从解析aspecial网站开始: <?php # Use the Curl extension to query Google and get back a page of results $url = "http://www.google.com"; $ch = curl_init(); $timeout = 5; curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_RETURNTRANSFER,

我使用以下示例代码从解析aspecial网站开始:

<?php

# Use the Curl extension to query Google and get back a page of results
$url = "http://www.google.com";
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$html = curl_exec($ch);
curl_close($ch);

# Create a DOM parser object
$dom = new DOMDocument();

# Parse the HTML from Google.
# The @ before the method call suppresses any warnings that
# loadHTML might throw because of invalid HTML in the page.
@$dom->loadHTML($html);

# Iterate over all the <a> tags
foreach($dom->getElementsByTagName('a') as $link) {
        # Show the <a href>
        echo $link->getAttribute('href');
        echo "<br />";
}
?>

然后,出于隐私原因,我将上面的url更改为removed,并再次运行脚本,但没有,我没有得到任何输出,但使用google url它将工作。那么我的网站有什么问题?是否有避免解析的保护方法,或者页面是否不符合标准?希望有人能帮助我。

该网站似乎只返回gzip编码的响应。因此,您需要设置正确的cURL编码并发送正确的编码头:

$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt($ch, CURLOPT_ENCODING , "gzip");
curl_setopt($ch, CURLOPT_HTTPHEADER, array(
    'Accept-Encoding: gzip, deflate, br',
));
$html = curl_exec($ch);
curl_close($ch);

这对我来说很有用。

试着输出HTML,看看它会返回什么。还可以查看HTTP响应头。话虽如此,如果URL在浏览器中工作,而不是在curl中工作,很可能是因为它拒绝了未设置用户代理的请求。我以前见过几次。你的卷曲扩展启用了吗?我可以使用您的代码检索链接