PHP cURL从https页面下载0字节,只能使用用户名和密码访问

PHP cURL从https页面下载0字节,只能使用用户名和密码访问,php,curl,Php,Curl,我正试图从Lending Club的网站(www.lendingclub.com)下载一个zip文件 到目前为止,我确定我必须登录才能下载文件。下载url类似于: array(23) { ["url"]=> string(135) "https://resources.lendingclub.com/secure/LoanStats3a_securev1.csv.zip?signature=LoEEC1JOFCjfwhv3y6atOMnD2rA%3D&issued=14596414

我正试图从Lending Club的网站(www.lendingclub.com)下载一个zip文件

到目前为止,我确定我必须登录才能下载文件。下载url类似于:

array(23) { ["url"]=> string(135) "https://resources.lendingclub.com/secure/LoanStats3a_securev1.csv.zip?signature=LoEEC1JOFCjfwhv3y6atOMnD2rA%3D&issued=1459641477069" ["content_type"]=> NULL ["http_code"]=> int(401) ["header_size"]=> int(201) ["request_size"]=> int(192) ["filetime"]=> int(-1) ["ssl_verify_result"]=> int(0) ["redirect_count"]=> int(0) ["total_time"]=> float(0.229254) ["namelookup_time"]=> float(0.026935) ["connect_time"]=> float(0.065868) ["pretransfer_time"]=> float(0.187812) ["size_upload"]=> float(0) ["size_download"]=> float(0) ["speed_download"]=> float(0) ["speed_upload"]=> float(0) ["download_content_length"]=> float(0) ["upload_content_length"]=> float(0) ["starttransfer_time"]=> float(0.22921) ["redirect_time"]=> float(0) ["certinfo"]=> array(0) { } ["primary_ip"]=> string(14) "216.115.73.151" ["redirect_url"]=> string(0) "" } bool(true)

每次我登录到站点时,签名和已发布字段都会更改。如果我将url复制并粘贴到其他浏览器窗口,我就可以下载该文件

我相信该网站在允许我下载该文件之前,会检查签名是否有效并发布

我可以登录到该网站并导航到文件所在的页面。我正在使用cURL来这样做。我能够捕获带有签名和已发布字段的特定url。然而,当我执行cURL下载它时,我收到一个http代码401的响应

似乎该网站不承认我已登录,并以401代码响应

下面是我用来登录和下载文件的代码:

$cookie = 'cookie.txt';
$url = 'https://www.lendingclub.com/account/login.action';

//first cURL request to obtain cookie
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); // Accepts all CAs 
curl_setopt($ch, CURLOPT_URL, $url); 
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie); // Stores cookies in the temp file 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); 
$output = curl_exec($ch);

//second cURL request to submit my login credentials and login to the site
$fields = array( 
    'login_email' => 'email@example.com', 
    'login_password' => 'mypassword', 
);
$fields_string = ''; 
foreach($fields as $key=>$value)
{ 
    $fields_string .= $key . '=' . $value . '&'; 
}
rtrim($fields_string, '&'); 
$ch = curl_init(); 
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); // Accepts all CAs 
curl_setopt($ch, CURLOPT_URL, $url); 
curl_setopt($ch, CURLOPT_POST, count($fields)); 
curl_setopt($ch, CURLOPT_POSTFIELDS, $fields_string); 
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie); //Uses cookies from the temp file 
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie); // Stores cookies in the temp file 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); 
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); // Tells cURL to follow redirects 
$output = curl_exec($ch);

//third cURL request to get url where the file I want to download is.
$url = 'https://www.lendingclub.com/info/download-data.action'; 
$ch = curl_init(); 
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); // Accepts all CAs 
curl_setopt($ch, CURLOPT_URL, $url); 
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie); //Uses cookies from the temp file 
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie); // Stores cookies in the temp file 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); 
$output = curl_exec($ch); 

//regular expression to capture the url (with signature and issued fields)
$regex = '/\b(https?|ftp|file):\/\/resources\.lendingclub\.com\/secure[-A-Z0-9+&@#\/%?=~_|$!:,.;]*[A-Z0-9+&@#\/%=~_|$]/i';
preg_match_all($regex, $output, $parts);
$url3a = $parts[0][0];
OutputMsg($url3a); //output the url to confirm I captured the whole url including the query string

//fourth cURL to download the zip file
set_time_limit(0); //prevent timeout
$fp = fopen (dirname(__FILE__) . '/' . 'testfile.zip', 'w+');
$ch = curl_init(); 
curl_setopt($ch, CURLOPT_TIMEOUT, 5040);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); // Accepts all CAs 
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($ch, CURLOPT_URL, $url3a); 
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie); //Uses cookies from the temp file 
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie); // Stores cookies in the temp file 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); 
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); // Tells cURL to follow redirects 
$output = curl_exec($ch); 
$info = curl_getinfo($ch);
curl_close($ch);
fclose($fp);
var_dump($info);
var_dump($output);
return;
我的回答是:

array(23) { ["url"]=> string(135) "https://resources.lendingclub.com/secure/LoanStats3a_securev1.csv.zip?signature=LoEEC1JOFCjfwhv3y6atOMnD2rA%3D&issued=1459641477069" ["content_type"]=> NULL ["http_code"]=> int(401) ["header_size"]=> int(201) ["request_size"]=> int(192) ["filetime"]=> int(-1) ["ssl_verify_result"]=> int(0) ["redirect_count"]=> int(0) ["total_time"]=> float(0.229254) ["namelookup_time"]=> float(0.026935) ["connect_time"]=> float(0.065868) ["pretransfer_time"]=> float(0.187812) ["size_upload"]=> float(0) ["size_download"]=> float(0) ["speed_download"]=> float(0) ["speed_upload"]=> float(0) ["download_content_length"]=> float(0) ["upload_content_length"]=> float(0) ["starttransfer_time"]=> float(0.22921) ["redirect_time"]=> float(0) ["certinfo"]=> array(0) { } ["primary_ip"]=> string(14) "216.115.73.151" ["redirect_url"]=> string(0) "" } bool(true)
有什么建议我可以做不同的下载文件

谢谢

更新#1-从评论部分实施drew010的建议

我在浏览器中导航到下载页面,点击链接下载文件。下面是我的浏览器发送的标题:

GET /secure/LoanStats3a_securev1.csv.zip?signature=4TWzCzq1bGdLXb3l76L6T6ElX1c%3D&issued=1459660640149 HTTP/1.1
Host: resources.lendingclub.com
Connection: keep-alive
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.110 Safari/537.36
Referer: https://www.lendingclub.com/info/download-data.action
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8
Cookie: <deleted for privacy>

还是一样的问题。它返回代码401。

我发现了问题。它与cURL请求或401代码无关

通过解析cURL请求的输出,我得到了想要下载的文件的url(见下文):

问题是url有一个编码为“&”的“&”。当我在屏幕上回显字符串时,我只能看到“&”,而不是“&”

因此,在玩了strlen和strpos之后,我发现了问题,并通过更换线路解决了问题:

$url3a = $parts[0][0];

这就解决了问题


谢谢。

尝试添加更多HTTP头(包括用户代理),使其看起来像真正的浏览器。此外,无需每次都创建一个新的卷曲句柄,您可以对每个请求重复使用相同的卷曲句柄,以简化操作。@drew010感谢您的响应。我确实检查了从浏览器窗口下载时发送的标题,并将其添加到cURL请求中。还是不行。我用结果更新了问题。你能确认cookie文件正在创建吗?我明天可以试着创建一个帐户,看看是否可以正常工作。@drew010正在创建cookie文件。它适用于前三个cURL请求。谢谢你的帮助。@drew010我终于找到问题了。我添加了一个答案来记录它以备将来参考。非常感谢你的帮助。
$url3a = $parts[0][0];
$url3a = htmlspecialchars_decode($parts[0][0]);