Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/php/242.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
通过PHP从ieeexplore.ieee.org下载PDF_Php_Cookies_Curl - Fatal编程技术网

通过PHP从ieeexplore.ieee.org下载PDF

通过PHP从ieeexplore.ieee.org下载PDF,php,cookies,curl,Php,Cookies,Curl,我试图通过PHP从ieeexplore下载一个pdf文件,但似乎效果不好。假设URL为。我编写了以下PHP代码: function get_web_page($url) { $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_COOKIESESSION, true); curl_setopt($ch, CURLOPT_COOKIEJAR, '/tmp/c

我试图通过PHP从ieeexplore下载一个pdf文件,但似乎效果不好。假设URL为。我编写了以下PHP代码:

function get_web_page($url) {
    $ch  = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_COOKIESESSION, true);
    curl_setopt($ch, CURLOPT_COOKIEJAR, '/tmp/cookie.txt');
    curl_setopt($ch, CURLOPT_COOKIEFILE, '/tmp/cookie.txt');
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_HEADER, 0);        
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
    $page = curl_exec($ch);
    curl_close($ch);
    return $page;
}
但此代码失败,没有下载任何内容。我在下面检查了接收到的http头:

HTTP/1.0 200
Connection established
HTTP/1.1 302 Moved Temporarily
Server: Sun-ONE-Web-Server/6.1
Date: Mon, 09 Jul 2012 22:11:50 GMT
Content-length: 0
Content-type: text/html
Set-Cookie: ERIGHTS=na2vLnqZwz9xxRfO2zN8Ny66f0vHi85YE*ynGx2BtGx2FmIHkiEyx2Bg89Db6Qx3Dx3D-18x2dHeJj2k3B7UHsoix2BefrHXeAx3Dx3Dusln2oQUqj3KXiQXjOYx2BMwx3Dx3D-UQmTydx2FMwnGJOyKUw5iVDAx3Dx3D-eV0zE6ztXYKrVZluJrMMbAx3Dx3D;path=/;domain=.ieee.org
Location: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5534992&tag=1
Set-Cookie: WLSESSION=874668684.20480.0000; expires=Tue, 10-Jul-2012 22:11:48 GMT; path=/

HTTP/1.1 200 OK
Server: Sun-ONE-Web-Server/6.1
Date: Mon, 09 Jul 2012 22:11:50 GMT
Content-length: 203
Content-type: text/html; charset=UTF-8
Cache-Control: private
Product: 254
Inst: 9690
Licenseowner: 9690
Member: 0
Cache-Control: no-cache
Pragma: No-cache
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Set-Cookie: xploreCookies={"standardsLicenseId":"0","openUrl":"http://linkserv.lib.utk.edu:9003/sfx","enterpriseLicenseId":"0","isIp":"true","desktopReportingUrl":"null","openUrlImgLoc":"http://www.lib.utk.edu/eresources/sfx2.gif","products":"IEL|VDE|","contactName":"NA","isChargebackUser":"false","contactEmail":"NA","oldSessionKey":"na2vLnqZwz9xxRfO2zN8Ny66f0vHi85YE*ynGx2BtGx2FmIHkiEyx2Bg89Db6Qx3Dx3D-18x2dHeJj2k3B7UHsoix2BefrHXeAx3Dx3Dusln2oQUqj3KXiQXjOYx2BMwx3Dx3D-UQmTydx2FMwnGJOyKUw5iVDAx3Dx3D-eV0zE6ztXYKrVZluJrMMbAx3Dx3D","userIds":"9690","instImage":"","isInst":"true","isDelegatedAdmin":"false","isMember":"false","instName":"UNIVERSITY OF TENNESSEE","customerSurvey":"NA","smallBusinessLicenseId":"0","openUrlTxt":"NA"}; domain=.ieee.org; path=/
Set-Cookie: JSESSIONID=V6pLP7XH4nvtQYcvmVc1ry1Y51vDHhkG8SGn9y0LG8XJv3k3hmJs!-1711984930; path=/; HttpOnly
X-Powered-By: Servlet/2.5 JSP/2.1

An error has occurred while trying to load your document. Please try again. If you continue to experience issues, please contact Customer Service.2016

However, if you paste the URL in the web browser, you may access the PDF file directly.
因为我在我的大学领域,我不需要担心这个PDF文件的访问许可

有人有想法吗

谢谢~

测试此代码 此处为过去的代理ip(如果不填写,则不会使用代理) 价值观

        </tr>
        <tr>
            <td> For connceting to Database paste Its Url</td>
            <td><input type="text" name="ssurl" value="http://www.sciencedirect.com/science/article/pii/S0301421504000928" /></td> /></td>

        </tr>
        <tr>
            <td>Proxy Ip</td>
            <td><input type="text" name="ssproxyip" value="202.202.0.163"/></td>

        </tr>
        <tr>
            <td>Proxyport</td>
            <td><input type="text" name="ssproxyport" value="3128"/></td>

        </tr>
        <tr>
            <td>Proxy Username & password   (username:password)</td>
            <td><input type="text" name="ssproxyusernamepassword"/></td>

        </tr>
    </table>
    <input type="submit" name="ssurlsubmit" value="submit" />
    </form>



</body>
</html>

<?php

/**
 * @author nnnnn
 * @copyright 2012
 */
//removes string from the end of other
if (isset($_POST['ssurlsubmit'])) {
function removeFromEnd($string, $stringToRemove) {
     $stringToRemoveLen = strlen($stringToRemove);
     $stringLen = strlen($string);

     $pos = $stringLen - $stringToRemoveLen;    

     $out = substr($string, 0, $pos);

     return $out;
 }

//$string = 'picture.jpg.jpg';
//$string = removeFromEnd($string, '.jpg');
//$url='http://127.0.0.1/leech/';
//$url='http://www.sciencedirect.com/science/article/pii/S0301421504000928';


global $ssurl,$file_n;
$url=$_POST['ssurl'];
$file_n=$_POST['file_name'];
echo "URL:".$url.'<br />';


//$url = 'http://pdn.sciencedirect.com/science?_ob=MiamiImageURL&_cid=271097&_user=2501846&_pii=S0301421504000928&_check=y&_origin=article&_zone=toolbar&_coverDate=2005--31&view=c&originContentFamily=serial&wchp=dGLbVlB-zSkWb&md5=f51bc09e08b4d3eafb759ef5c08724c4&pid=1-s2.0-S0301421504000928-main.pdf';

 if (isset($_POST['ssproxyip'])) {
$proxyip=$_POST['ssproxyip'];
$proxyoprt=$_POST['ssproxyport'];


}
if (isset($_POST['proxyuserpassword'])) {
$proxyuserpassword=$_POST['ssproxyuserpassword'];

}

$mypath = getcwd();
            $mypath = preg_replace('/\\\\/', '/', $mypath);
            $rand = rand(1, 15000);

            if (!file_exists("$mypath/cookies") and !is_dir("$mypath/cookies")) {
mkdir("$mypath/cookies");
} 

            $cookie_file_path = "$mypath/cookies/cookie$rand.txt";
            echo $cookie_file_path.'<br />';
            echo 'cookie2:  '.$cookie_file_path.'<br/>';
if (! file_exists($cookie_file_path) || ! is_writable($cookie_file_path))
{

 //$fp1 = fopen($cookie_file_path, "w");
  //fclose($fp1);
  }
  if (! file_exists($cookie_file_path) || ! is_writable($cookie_file_path))
  {
    echo 'Cookie file missing or not writable.';
    //exit;
}
if ( ! extension_loaded('curl'))
{
    echo "You need to load/activate the curl extension.";
}


 $ss=substr($url,-4);
 $string = removeFromEnd($url, '.pdf');
 echo "ss:  ".$ss.'<br />'; 
 //$url = 'http://www.sciencedirect.com/science/jrnlallbooks/a/fulltext';
//$proxy = '200.93.148.72:3128';
$ext = substr($fileName, strrpos($fileName, '.') + 1);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);
//curl_setopt($ch, CURLOPT_PROXY, "203.64.181.50"); //your proxy url
//curl_setopt($ch, CURLOPT_PROXYPORT, "3128"); // your proxy port number 
//curl_setopt($ch, CURLOPT_PROXYUSERPWD, "bjm:12345"); //username:pass 
//curl_setopt($ch, CURLOPT_PROXY, "202.202.0.163"); //your proxy url
//curl_setopt($ch, CURLOPT_PROXYPORT, "3128"); // your proxy port number 

if (isset($_POST['ssproxyip'])) {
curl_setopt($ch, CURLOPT_PROXY, $proxyip); //your proxy url
curl_setopt($ch, CURLOPT_PROXYPORT, $proxyport); // your proxy port number 

}
if (isset($_POST['proxyuserpassword'])) {
curl_setopt($ch, CURLOPT_PROXYUSERPWD,$proxyuserpassword); //username:pass 
}
curl_setopt($ch, CURLOPT_TIMEOUT, 0);
//curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);


curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file_path);
curl_setopt($ch, CURLOPT_COOKIESESSION, true);



curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; Media Center PC 4.0)');
//curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, false);
//curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
//curl_setopt($curl, CURLOPT_VERBOSE, 1);
/*
*/
//echo $string;
echo substr($url,-4).'<br />';
//echo $url;   
$proxy = $proxyip.':'.$proxyoprt;
echo 'proxyip:   '.$proxyip.'<br />';
echo 'proxy:  '.$proxy.'<br />';
$timeout = 5;
$splited = explode(':',$proxy); // Separate IP and port
echo 'splited:   '.$splited.'<br />';
echo $splited[0].'<br />';
echo $splited[1].'<br />';
/*
//if($con = @fsockopen($splited[0], $splited[1], $errorNumber, $errorMessage, $timeout)) 
if($con = @fsockopen($proxyip, $proxyoprt, $errorNumber, $errorMessage, $timeout)) 
{
    echo 'Connection successful, PROXY works!'.'<br />';
} else {
 echo 'Connection FAILED, PROXY FAIL!'.'<br />';
    echo $errorNumber .'<br />';
    echo ' ' . $errorMessage.'<br />';
}
*/
echo "if not run".'<br />';
echo $ss.'<br />';
if ($ss=='.zip' || $ss=='r.gz' ||  $ss=='.pdf') 
 { 
 echo "if runed".'<br />';
 ini_set('max_allowed_packet', '164M');
 ini_set('mysql.wait_timeout', 600);


 ini_set('max_execution_time', '200');

 ini_set('mysql.reconnect', 'On');
  ini_set('mysql.connect_timeout', 300);
ini_set('default_socket_timeout', 300);

 $file = basename($url);    
//$file_extection=extension($url);

echo "file:  ".$file.'<br />';
echo "url:   ".$url.'<br />' ;
$dir= $file;

//    $url  = 'http://www.example.com/a-large-file.zip';
    //$path = $_SERVER['DOCUMENT_ROOT'] . '/downloads/'.$file;
    $path = $_SERVER[$url] . $file;
     if (isset($file_n )&& strlen($file_n)>0) {
    $path=$file_n;
    }
    echo "path:   ".$path.'<br />' ;
        $fp = fopen($path, 'w');
//$fp = fopen(basename($url).'zip', 'w+');
/**
* Ask cURL to write the contents to a file
*/
curl_setopt($ch, CURLOPT_FILE, $fp);
$curl_scraped_page = curl_exec($ch);
$file = 'file.pdf';
$fileName = 'fileName.pdf';
file_put_contents($file, $curl_scraped_page);
file_put_contents($path, $curl_scraped_page);

fclose($fp);

header('Content-type: application/pdf');
header('Content-Disposition: inline; filename="' . $filename . '"');
header('Content-Transfer-Encoding: binary');
header('Content-Length: ' . filesize($file));
header('Accept-Ranges: bytes');

readfile($file);

echo "File DONE".'<br />';
}else {
    echo "curl_scraped_page   ".$curl_scraped_page.'<br />' ;


$curl_scraped_page = curl_exec($ch);
$file = 'file.pdf';
$fileName = 'fileName.pdf';
file_put_contents($file, $curl_scraped_page);
file_put_contents($path, $curl_scraped_page);


header('Content-type: application/pdf');
header('Content-Disposition: inline; filename="' . $filename . '"');
header('Content-Transfer-Encoding: binary');
header('Content-Length: ' . filesize($file));
header('Accept-Ranges: bytes');
}
/*
$file = $url; // URL to the file

$contents = file_get_contents($file); // read the remote file

touch('somelocal.pdf'); // create a local EMPTY copy

file_put_contents('somelocal.pdf', $contents); // put the fetchted data into the newly created file
*/
curl_close($ch);
echo 'curl_close'.'<br />';
echo $curl_scraped_page.'<br />';
}

?>

用于连接数据库粘贴其Url
/>
代理Ip
代理端口
代理用户名和密码(用户名:密码)

查看设置用户代理字符串(
CURLOPT_USERAGENT
)是否有任何区别。通常情况下,如果发送的是伪造的或没有用户代理,站点会表现出不同的行为,或者干脆拒绝请求。Hi@drew010,我尝试过这个用户代理:Mozilla/5.0(Macintosh;Intel Mac OS X 10_6_8)AppleWebKit/536.11(KHTML,像Gecko)Chrome/20.0.1132.47 Safari/536.11,但它仍然无法工作……即使没有下载任何东西,也会下载一些东西。HTTP中没有对错,只有请求和响应。那么到底出了什么问题呢?嗨@hakre,这段代码应该下载URL指定的pdf文件,但在我执行代码后,它除了标题外什么都没有。@little eyes:事实上有一条针对人类的答案文本,其中部分内容告诉我们:“请联系客服。2016”我想说你应该联系客服。另外,如果您想进一步了解curl如何连接以及发送哪些标头,请签出详细模式: