Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/php/275.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/spring-mvc/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Php 网页抓取重定向_Php_Curl_Web Scraping - Fatal编程技术网

Php 网页抓取重定向

Php 网页抓取重定向,php,curl,web-scraping,Php,Curl,Web Scraping,我可以用下面的代码浏览大多数网站,但有些网站会将我重定向到=>distil\u r\u blocked.html 这是我得到的头球 HTTP/1.1 200 OK Date: Mon, 26 Jun 2017 20:30:12 GMT Content-Type: text/html Transfer-Encoding: chunked Connection: keep-alive Vary: Accept-Encoding Expires: Thu, 01 Jan 1970 00:00:01 G

我可以用下面的代码浏览大多数网站,但有些网站会将我重定向到=>distil\u r\u blocked.html

这是我得到的头球

HTTP/1.1 200 OK Date: Mon, 26 Jun 2017 20:30:12 GMT Content-Type: text/html Transfer-Encoding: chunked Connection: keep-alive Vary: Accept-Encoding Expires: Thu, 01 Jan 1970 00:00:01 GMT Cache-Control: no-cache Cache-Control: private, no-cache, no-store, must-revalidate Edge-Control: no-store, bypass-cache Surrogate-Control: no-store, bypass-cache
这是我的密码

function file_get_contents_curl($target_url,$json=false){
$ch = curl_init();
$headers = array();
if($json) {
    $headers[] = 'Content-type: application/json';
    $headers[] = 'X-HTTP-Method-Override: GET';
}
$options = array(
    CURLOPT_URL => $target_url,
    CURLOPT_HTTPHEADER => array($headers),
    CURLOPT_TIMEOUT => 300,
    CURLOPT_FOLLOWLOCATION => 1,
    CURLOPT_AUTOREFERER => 1,
    CURLOPT_RETURNTRANSFER => 1,
    CURLOPT_HEADER => 1,
    CURLOPT_FOLLOWLOCATION => 1,
    CURLOPT_MAXREDIRS => 3,
    CURLOPT_TIMEOUT => 10,
    CURLOPT_USERAGENT => 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.9) Gecko/20071025 Firefox/2.0.0.9');
curl_setopt_array($ch,$options);
$response = curl_exec($ch);
if($response === false || curl_error($ch)) {
    curl_close($ch);
    return false;
} else {
    curl_close($ch);
    return $response;
}
}

// Create a curl handle to a non-existing location
$ch = curl_init($target_url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

if(curl_exec($ch) === false)
{
echo 'Curl error: ' . curl_error($ch);
}
else
{
echo 'Operation completed without any errors';
}

$data = file_get_contents_curl($target_url);
$html = str_get_html($data);
重定向是否存在任何问题


谢谢,Simon,您的cURL选项
CURLOPT\u FOLLOWLOCATION
设置为TRUE,这意味着它将遵循重定向。将其设置为0,它将不会遵循重定向。当不需要时,您也会使用此选项两次


关于检索原始内容,您将无法控制它,因为服务器正在控制响应。充其量,你可以尝试欺骗头部或使用不同的IP,但这通常是不赞成的。。。主要是因为这是一种草率的行为(在我看来)。

也许你有礼貌地尊重他们,而不是刮伤那些不想让你这么做的人?