Php 如何找到使用cURL重定向的位置？_Php_Redirect_Curl

Php 如何找到使用cURL重定向的位置？

php redirect curl

Php 如何找到使用cURL重定向的位置？,php,redirect,curl,Php,Redirect,Curl,我试图让curl遵循重定向，但我不能让它正常工作。我有一个字符串，我想将其作为GET参数发送到服务器并获取结果URL 例如： String=Kobold害虫 Url=www.wowhead.com/search？q=Kobold+Worker 如果你转到该url，它会将你重定向到“www.wowhead.com/npc=257”。我希望curl将这个URL返回到我的PHP代码中，这样我就可以提取“npc=257”并使用它当前代码： function npcID($name) { $ur

我试图让curl遵循重定向，但我不能让它正常工作。我有一个字符串，我想将其作为GET参数发送到服务器并获取结果URL

例如：

String=Kobold害虫
Url=www.wowhead.com/search？q=Kobold+Worker

如果你转到该url，它会将你重定向到“www.wowhead.com/npc=257”。我希望curl将这个URL返回到我的PHP代码中，这样我就可以提取“npc=257”并使用它

当前代码：

function npcID($name) {
    $urltopost = "http://www.wowhead.com/search?q=" . $name;
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1");
    curl_setopt($ch, CURLOPT_URL, $urltopost);
    curl_setopt($ch, CURLOPT_REFERER, "http://www.wowhead.com");
    curl_setopt($ch, CURLOPT_HTTPHEADER, Array("Content-Type:application/x-www-form-urlencoded"));
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
    return curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
}

但是，这将返回www.wowhead.com/search？q=Kobold+Worker，而不是www.wowhead.com/npc=257

我怀疑PHP是在外部重定向发生之前返回的。如何解决此问题？

要使cURL遵循重定向，请使用：

curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);

嗯。。。我不认为你真的在做卷曲。。。尝试：

curl_exec（$ch）
…在设置选项之后，在调用curl\u getinfo（）
之前
编辑：如果您只是想知道页面重定向到哪里，我会使用建议，然后使用Curl获取标题并从中提取Location:header：
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($ch);
if (preg_match('~Location: (.*)~i', $result, $match)) {
   $location = trim($match[1]);
}

上面的答案在我的一台服务器上不适用，这与basedir有关，所以我对它进行了一些重新哈希处理。下面的代码适用于我的所有服务器
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
$a = curl_exec($ch);
curl_close( $ch ); 
// the returned headers
$headers = explode("\n",$a);
// if there is no redirection this will be the final url
$redir = $url;
// loop through the headers and check for a Location: str
$j = count($headers);
for($i = 0; $i < $j; $i++){
// if we find the Location header strip it and fill the redir var       
if(strpos($headers[$i],"Location:") !== false){
        $redir = trim(str_replace("Location:","",$headers[$i]));
        break;
    }
}
// do whatever you want with the result
echo redir;

$ch=curl_init（）；
curl_setopt（$ch，CURLOPT_URL，$URL）；
curl_setopt（$ch，CURLOPT_头，true）；
curl_setopt（$ch，CURLOPT_FOLLOWLOCATION，false）；
curl_setopt（$ch，CURLOPT_RETURNTRANSFER，TRUE）；
$a=旋度执行（$ch）；
卷曲关闭（$ch）；
//返回的标题
$headers=explode（“\n”，$a）；
//如果没有重定向，这将是最终的url
$redir=$url；
//循环浏览标题并检查位置：str
$j=计数（$headers）；
对于（$i=0；$i<$j；$i++）{
//如果我们找到位置标题，将其剥离并填充redir变量
if（strpos（$headers[$i]，“Location:”）！==false）{
$redir=trim（str_replace（“位置：”，“，$headers[$i]））；
打破
}
}
//对结果做任何你想做的事
回波雷达；
这里选择的答案不错，但它区分大小写，不能防止相对的位置：
标题（某些网站会这样做）或在其内容中实际包含短语位置：
的页面。。。（zillow目前就是这样做的）
有点草率，但要使其更智能一些，需要进行一些快速编辑：
function getOriginalURL($url) {
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_HEADER, true);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
    $result = curl_exec($ch);
    $httpStatus = curl_getinfo($ch, CURLINFO_HTTP_CODE);
    curl_close($ch);

    // if it's not a redirection (3XX), move along
    if ($httpStatus < 300 || $httpStatus >= 400)
        return $url;

    // look for a location: header to find the target URL
    if(preg_match('/location: (.*)/i', $result, $r)) {
        $location = trim($r[1]);

        // if the location is a relative URL, attempt to make it absolute
        if (preg_match('/^\/(.*)/', $location)) {
            $urlParts = parse_url($url);
            if ($urlParts['scheme'])
                $baseURL = $urlParts['scheme'].'://';

            if ($urlParts['host'])
                $baseURL .= $urlParts['host'];

            if ($urlParts['port'])
                $baseURL .= ':'.$urlParts['port'];

            return $baseURL.$location;
        }

        return $location;
    }
    return $url;
}

函数getOriginalURL（$url）{
$ch=curl_init（）；
curl_setopt（$ch，CURLOPT_URL，$URL）；
curl_setopt（$ch，CURLOPT_头，true）；
curl_setopt（$ch，CURLOPT_FOLLOWLOCATION，false）；
curl_setopt（$ch，CURLOPT_RETURNTRANSFER，TRUE）；
$result=curl\u exec（$ch）；
$httpStatus=curl\u getinfo（$ch，CURLINFO\u HTTP\u代码）；
卷曲关闭（$ch）；
//如果不是重定向（3XX），请继续
如果（$httpStatus<300 | |$httpStatus>=400）
返回$url；
//查找location:header以查找目标URL
if（preg_匹配（'/location:（.*）/i'，$result，$r））{
$location=trim（$r[1]）；
//如果位置是相对URL，请尝试将其设置为绝对位置
if（preg_match（'/^\/（.*）/'，$location））{
$urlParts=解析url（$url）；
如果（$urlParts['scheme']））
$baseURL=$urlParts['scheme'].：/'；
if（$urlParts['host']））
$baseURL.=$urlParts['host']；
如果（$urlParts['port']））
$baseURL.='：'.$urlParts['port']；
返回$baseURL.$location；
}
返回$location；
}
返回$url；
}

请注意，这仍然只有1个重定向深度。要深入了解，实际上需要获取内容并遵循重定向。

有时需要获取HTTP头，但同时不希望返回这些头**

这个框架使用递归处理cookie和HTTP重定向。这里的主要思想是避免将HTTP头返回客户机代码
您可以在其上构建一个非常强的curl类。添加POST功能等

<?php class curl { static private $cookie_file = ''; static private $user_agent = ''; static private $max_redirects = 10; static private $followlocation_allowed = true; function __construct() { // set a file to store cookies self::$cookie_file = 'cookies.txt'; // set some general User Agent self::$user_agent = 'Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)'; if ( ! file_exists(self::$cookie_file) || ! is_writable(self::$cookie_file)) { throw new Exception('Cookie file missing or not writable.'); } // check for PHP settings that unfits // correct functioning of CURLOPT_FOLLOWLOCATION if (ini_get('open_basedir') != '' || ini_get('safe_mode') == 'On') { self::$followlocation_allowed = false; } } /** * Main method for GET requests * @param string $url URI to get * @return string request's body */ static public function get($url) { $process = curl_init($url); self::_set_basic_options($process); // this function is in charge of output request's body // so DO NOT include HTTP headers curl_setopt($process, CURLOPT_HEADER, 0); if (self::$followlocation_allowed) { // if PHP settings allow it use AUTOMATIC REDIRECTION curl_setopt($process, CURLOPT_FOLLOWLOCATION, true); curl_setopt($process, CURLOPT_MAXREDIRS, self::$max_redirects); } else { curl_setopt($process, CURLOPT_FOLLOWLOCATION, false); } $return = curl_exec($process); if ($return === false) { throw new Exception('Curl error: ' . curl_error($process)); } // test for redirection HTTP codes $code = curl_getinfo($process, CURLINFO_HTTP_CODE); if ($code == 301 || $code == 302) { curl_close($process); try { // go to extract new Location URI $location = self::_parse_redirection_header($url); } catch (Exception $e) { throw $e; } // IMPORTANT return return self::get($location); } curl_close($process); return $return; } static function _set_basic_options($process) { curl_setopt($process, CURLOPT_USERAGENT, self::$user_agent); curl_setopt($process, CURLOPT_COOKIEFILE, self::$cookie_file); curl_setopt($process, CURLOPT_COOKIEJAR, self::$cookie_file); curl_setopt($process, CURLOPT_RETURNTRANSFER, 1); // curl_setopt($process, CURLOPT_VERBOSE, 1); // curl_setopt($process, CURLOPT_SSL_VERIFYHOST, false); // curl_setopt($process, CURLOPT_SSL_VERIFYPEER, false); } static function _parse_redirection_header($url) { $process = curl_init($url); self::_set_basic_options($process); // NOW we need to parse HTTP headers curl_setopt($process, CURLOPT_HEADER, 1); $return = curl_exec($process); if ($return === false) { throw new Exception('Curl error: ' . curl_error($process)); } curl_close($process); if ( ! preg_match('#Location: (.*)#', $return, $location)) { throw new Exception('No Location found'); } if (self::$max_redirects-- <= 0) { throw new Exception('Max redirections reached trying to get: ' . $url); } return trim($location[1]); } }

您可以使用： $redirectURL = curl_getinfo($ch,CURLINFO_REDIRECT_URL); 将这一行添加到curl序列化 curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); 并在关闭前使用getinfo $redirectURL = curl_getinfo($ch,CURLINFO_EFFECTIVE_URL ); es: 这里有很多正则表达式，尽管我真的很喜欢它们，但这样对我来说可能更稳定： $resultCurl=curl_exec($curl); //get curl result //Optional line if you want to store the http status code $headerHttpCode=curl_getinfo($curl,CURLINFO_HTTP_CODE); //let's use dom and xpath $dom = new \DOMDocument(); libxml_use_internal_errors(true); $dom->loadHTML($resultCurl, LIBXML_HTML_NODEFDTD); libxml_use_internal_errors(false); $xpath = new \DOMXPath($dom); $head=$xpath->query("/html/body/p/a/@href"); $newUrl=$head[0]->nodeValue; 位置部分是apache发送的HTML中的链接。因此Xpath非常适合恢复它。这使得php遵循重定向。我不想跟踪重定向，我只想知道重定向页面的url。哦，那么你实际上不想获取页面？只要找出位置就行了？在这种情况下，我建议这里使用的策略：--基本上只需从重定向页面中获取标题，然后从中获取位置：header。不过，无论哪种方式，您仍然需要为Curl执行exec（）才能真正执行任何操作……我建议您看看下面的Luca Camillos解决方案，因为此解决方案不考虑多个重定向。此解决方案将在同一url内打开新网页。我想更改url，同时将参数发布到该url。当我使用$httpCode=curl\u getinfo（$handle，CURLINFO\u HTTP\u CODE）时，@MattGibson；当CURLOPT_FOLLOWLOCATION设置为true时，httpcode将是什么。我的意思是，对于第一个url还是重定向url，位置：标题并不总是跟随重定向。另外，请看一个明确的问题：这是“curl follow重定向”的首要问题之一。要使用curl 命令自动跟踪重定向，请传递-L 或--location 标志。例如，curl-Lhttp://example.com/ 我认为这是一个更好的解决方案，因为它还可以展开多个重定向。请记住：（好的，duh）重定向后不会重新提交POST数据。在我的例子中，这种情况发生了，后来我觉得自己很愚蠢，因为：只要使用合适的url，它就被修复了。使用curl\u setopt（$ch，CURLOPT\u SSL\u VERIFYPEER，false）是一个安全漏洞 $resultCurl=curl_exec($curl); //get curl result //Optional line if you want to store the http status code $headerHttpCode=curl_getinfo($curl,CURLINFO_HTTP_CODE); //let's use dom and xpath $dom = new \DOMDocument(); libxml_use_internal_errors(true); $dom->loadHTML($resultCurl, LIBXML_HTML_NODEFDTD); libxml_use_internal_errors(false); $xpath = new \DOMXPath($dom); $head=$xpath->query("/html/body/p/a/@href"); $newUrl=$head[0]->nodeValue;