Php 卷曲错误：最多（20）次重定向_Php_Curl_Web Scraping

Php 卷曲错误：最多（20）次重定向

php curl web-scraping

Php 卷曲错误：最多（20）次重定向,php,curl,web-scraping,Php,Curl,Web Scraping,当试图卷曲到myntra时，它给出了错误。我试图通过DOMDOCUMENT获取提取的详细信息，但它给出了相同的错误：最多（20）个重定向这是我的密码： <?php $url = 'http://www.myntra.com/sports-shoes/nike/nike-men-black-dart-12-msl-running-shoes/1547908/buy?src=search&uq=false&q=nike&p=1'; $

当试图卷曲到myntra时，它给出了错误。我试图通过DOMDOCUMENT获取提取的详细信息，但它给出了相同的错误：

最多（20）个重定向

这是我的密码：

<?php
        $url = 'http://www.myntra.com/sports-shoes/nike/nike-men-black-dart-12-msl-running-shoes/1547908/buy?src=search&uq=false&q=nike&p=1';
        $ch  = curl_init($url);
        //curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)");
        curl_setopt($ch, CURLOPT_AUTOREFERER, true);
        curl_setopt($ch, CURLOPT_HEADER, 1);
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
        curl_setopt($ch, CURLOPT_FAILONERROR, true);
        curl_setopt($ch, CURLOPT_HTTPHEADER, array("Cookie: test=cookie"));

        $cl  = curl_exec($ch);
        if(curl_exec($ch) === false)
        {
                echo 'Curl error: ' . curl_error($ch);
                echo 'Curl error: ' . curl_errorno($ch);
        }else{
           $dom = new DOMDocument();
           $xpath = new DOMXpath($dom);
           print_r($xpath);            
        }
?>

为此使用CURLOPT_MAXREDIRS选项

curl_setopt($ch, CURLOPT_MAXREDIRS , 1000);

我希望它能起作用，祝你好运

添加一些cookies文件

<?php

$url = 'http://www.myntra.com/sports-shoes/nike/nike-men-black-dart-12-msl-running-shoes/1547908/buy?src=search&uq=false&q=nike&p=1';
$ch  = curl_init($url);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:47.0) Gecko/20100101 Firefox/47.0");
$request_headers = [
                'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8;',
                'Accept-Encoding: gzip, deflate',
                "Connection: keep-alive",
                "Content-Type: text/html; charset=UTF-8",

            ];
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, $request_headers);
curl_setopt($ch, CURLOPT_ENCODING, "");
$cl  = curl_exec($ch);
$h = curl_getinfo($ch);
$e = curl_error($ch);
curl_close($ch);
var_dump($cl);

像这样

curl_setopt($ch, CURLOPT_COOKIEJAR, dirname(__FILE__) . '/cookie.txt');
curl_setopt($ch, CURLOPT_COOKIEFILE, dirname(__FILE__) . '/cookie.txt');

它应该会起作用

这很少是治疗方法。环路是个问题，它不能通过多圈来修复。通常是一种饼干。在使用curl_setopt（$ch，CURLOPT_MAXREDIRS，1000）时；，它一直在加载，我还没有得到任何结果！！！删除标题

'Accept-Encoding:gzip，deflate'，

这不仅不是解决方法，而且是错误的建议（-1）。如果它卡在重定向循环中（默认值为10左右），则增加最大重定向次数只会在再次失败之前循环更多，从而出现针对刮取目标的拒绝服务攻击。OP已经不遵守Robots.txt协议，这使得情况变得更糟。我现在得到了响应，但是如何从中获取标签数据呢？你能提出一些建议吗？这是一个完全不同的方面，就像你想如何解析。。你想从

tag中得到什么你需要研究myntra上的代码结构，然后为此编写方法..我正在这样做：$dom=newdomdocument（）$dom->loadHTML（$cl）$xpath=新的DOMXpath（$dom）$product_name=$xpath->query（'//h1[@class=“pdp title”]'）；但是没有得到值，当看到firebug上的网络活动时，我得到了500个内部服务器错误确实是这样，但不知道为什么