PHP curl无法获得预期的页面内容,Firefox可以。可能的原因?

PHP curl无法获得预期的页面内容,Firefox可以。可能的原因?,php,curl,Php,Curl,当我使用curl在电子商务站点上获取页面时,它总是给我相同的首页,忽略起始项参数;然而,当我在浏览器中访问url时,它会像往常一样工作 简化代码: // s is the starting item count, no idea what yp4p_page is for exactly yet. $url = 'http://list.taobao.com/market/baobao.htm?cat=40&yp4p_page=4&s=176'; $ch = curl_init

当我使用curl在电子商务站点上获取页面时,它总是给我相同的首页,忽略起始项参数;然而,当我在浏览器中访问url时,它会像往常一样工作

简化代码:

// s is the starting item count, no idea what yp4p_page is for exactly yet.
$url = 'http://list.taobao.com/market/baobao.htm?cat=40&yp4p_page=4&s=176';

$ch = curl_init($url);

$header[0] = 'Accept: text/xml,application/xml,application/xhtml+xml,'
                . 'text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5';
$header[] = 'Cache-Control: max-age=0';
$header[] = 'Connection: keep-alive';
$header[] = 'Keep-Alive: 300';
$header[] = 'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7';
$header[] = 'Accept-Language: en-us,en;q=0.5';

//$cookieFile = tempnam('/tmp', 'curlcookie');
$cookieFile = dirname(__FILE__) . DIRECTORY_SEPARATOR . 'curlcookies.txt';

$options = array(
            CURLOPT_RETURNTRANSFER => true,
            CURLOPT_HEADER => false,
            CURLOPT_FOLLOWLOCATION => true,
            CURLOPT_ENCODING => 'gzip,deflate',
            CURLOPT_USERAGENT => 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0) Gecko/20100101 Firefox/6.0 FirePHP/0.6',
            CURLOPT_AUTOREFERER => true,
            CURLOPT_CONNECTTIMEOUT => 120,
            CURLOPT_TIMEOUT => 120, 
            CURLOPT_MAXREDIRS => 10, 
            CURLOPT_SSL_VERIFYHOST => 0,
            CURLOPT_SSL_VERIFYPEER => false, 
            CURLOPT_VERBOSE => 1,
            CURLOPT_HTTPHEADER => $header,
            CURLOPT_COOKIEFILE => $cookieFile,
            CURLOPT_COOKIEJAR => $cookieFile,
);

curl_setopt_array($ch, $options);

$strPageHTML = curl_exec($ch);

curl_close($ch);
我为中文网站感到抱歉,但是如果你看一下列出的项目和curl返回的url,它们的id总是与首页上的相同,其中s=0,但它们应该是不同的

我做错了什么

编辑1:将cookie添加到代码中,仍然不起作用

编辑2:编辑cookie行以清除任何混淆。cookies的内容也如下所示:

# Netscape HTTP Cookie File
# http://curl.haxx.se/rfc/cookie_spec.html
# This file was generated by libcurl! Edit at your own risk.

#HttpOnly_.taobao.com   TRUE    /   FALSE   0   cookie2 d686d4be95b4b56b61292118b43e1333
#HttpOnly_.taobao.com   TRUE    /   FALSE   1316321978  _tb_token_  eeab7e3e5ea9e
.taobao.com TRUE    /   FALSE   1321505978  t   3c473872e51e93b0cf172375b31f503a
.taobao.com TRUE    /   FALSE   0   uc1 cookie14=UoLdHCGrCsSKAg%3D%3D
.taobao.com TRUE    /   FALSE   0   v   0
.taobao.com TRUE    /   FALSE   0   _lang   zh_CN:GBK

这个页面使用了很多cookie,我不会惊讶于加载页面需要会话cookie。看看启用时会发生什么

curl_setopt($DATA_POST, CURLOPT_COOKIEFILE, 'cookiefile.txt'); 
curl_setopt($DATA_POST, CURLOPT_COOKIEJAR, 'cookiefile.txt');

你应该看看这个网站生成的cookies,甚至是一些CSRF令牌,它们会让你远离解析工作。 当我在第一次加载时检查网页时,我可以发现:

Set-Cookie:cookie2=b1d92ddac8aa82350a6ff5e892a8637d;Domain=.taobao.com;Path=/;HttpOnly
_tb_token_=fde3979ee6b13;Domain=.taobao.com;Path=/;Expires=Sat, 17-Sep-2011 07:09:40     GMT;HttpOnly
t=91f29eb410a21a04bf36025823c4b2ad; Domain=.taobao.com; Expires=Wed, 16-Nov-2011 07:09:40 GMT; Path=/
uc1=cookie14=UoLdHCDBHbn1eg%3D%3D; Domain=.taobao.com; Path=/
也许这些cookies用于在类别中导航时识别您


在DOM中搜索令牌也会产生一些结果。

是否可以通过api访问您所需的信息,而不是假装用户来访问页面http://open.taobao.com/?

您知道它是否可以写入txt文件吗。您可能需要更改为“/tmp/cookiefile.txt”,如果您在中的linuxAs上,您成功获得了第4页的实际内容,每页176个项目/44个项目,或者它只返回了第1页?z
// s is the starting item count, no idea what yp4p_page is for exactly yet.
$url = 'http://list.taobao.com/market/baobao.htm?cat=40&yp4p_page=4&s=176';

$ch = curl_init($url);

$header[0] = 'Accept: text/xml,application/xml,application/xhtml+xml,'
                . 'text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5';
$header[] = 'Cache-Control: max-age=0';
$header[] = 'Connection: keep-alive';
$header[] = 'Keep-Alive: 300';
$header[] = 'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7';
$header[] = 'Accept-Language: en-us,en;q=0.5';

$cookieFile = "cookie_china"; // I've changed this value and it seems to be working fine, I get the same results

$options = array(
            CURLOPT_RETURNTRANSFER => true,
            CURLOPT_HEADER => false,
            CURLOPT_FOLLOWLOCATION => true,
            CURLOPT_ENCODING => 'gzip,deflate',
            CURLOPT_USERAGENT => 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0) Gecko/20100101 Firefox/6.0 FirePHP/0.6',
            CURLOPT_AUTOREFERER => true,
            CURLOPT_CONNECTTIMEOUT => 120,
            CURLOPT_TIMEOUT => 120, 
            CURLOPT_MAXREDIRS => 10, 
            CURLOPT_SSL_VERIFYHOST => 0,
            CURLOPT_SSL_VERIFYPEER => false, 
            CURLOPT_VERBOSE => 1,
            CURLOPT_HTTPHEADER => $header,
            CURLOPT_COOKIEFILE => $cookieFile,
            CURLOPT_COOKIEJAR => $cookieFile,
);

curl_setopt_array($ch, $options);

$strPageHTML = curl_exec($ch);

curl_close($ch);