Php 将图像从live server复制到本地_Php

Php 将图像从live server复制到本地

php

Php 将图像从live server复制到本地,php,Php,我在不同的表中有大约600k的图像URL，我正在下载所有的图像和下面的代码，它工作得很好。（我知道FTP是最好的选择，但不知何故我不能使用它。） $queryRes=mysql_query（“从tablName LIMIT 50000中选择url”）；//每次我使用极限 while（$row=mysql\u fetch\u object（$queryRes））{ $info=pathinfo（$row->url）； $fileName=$info['fileName']； $fileExtens

我在不同的表中有大约600k的图像URL，我正在下载所有的图像和下面的代码，它工作得很好。（我知道FTP是最好的选择，但不知何故我不能使用它。）

$queryRes=mysql_query（“从tablName LIMIT 50000中选择url”）；//每次我使用极限
while（$row=mysql\u fetch\u object（$queryRes））{
$info=pathinfo（$row->url）；
$fileName=$info['fileName']；
$fileExtension=$info['extension']；
试一试{
复制（“http:”..row->url，“img/$fileName”。“..$row->id.”“..$fileExtension）；
}捕获（例外$e）{
echo“
\n无法复制“$fileName”。错误：$e”；
}
}

问题是：

一段时间后，比如说10分钟，脚本会给出503错误。但还是继续下载图片。为什么，它应该停止复制它

而且它不会下载所有的图片，每次会有100到150张图片的差异。那么，我如何跟踪未下载的图像呢

我希望我解释得很好。

我自己没有使用

copy

，我会使用

file\u-get\u-contents

它可以与远程服务器配合使用

编辑：

也返回false。所以

if( false === file_get_contents(...) )
    trigger_error(...);

503是一个相当普遍的错误，在本例中，这可能意味着某些内容超时。这可能是您的web服务器、某个地方的代理，甚至是PHP

您需要确定哪个组件正在超时。如果是PHP，您可以使用set_time_limit

另一种选择可能是分解工作，以便每个请求只处理一个文件，然后重定向回同一个脚本以继续处理其余文件。您必须以某种方式维护一个列表，其中列出了在调用之间已处理的文件。或按数据库id的顺序处理，并在重定向时将上次使用的id传递给脚本

我觉得50000太大了。网络是非常耗时的，下载一个图像可能需要超过100毫秒（取决于你的网络环境），所以50000个图像，在最稳定的情况下（没有超时或其他错误），可能需要50000*100/1000/60=83分钟，对于像php这样的脚本来说，这确实是一个很长的时间。如果以cgi（而不是cli）运行此脚本，通常默认情况下只有30秒（没有设置时间限制）。因此，我建议将此脚本设置为cronjob，并每10秒运行一次，以获取大约50个URL

要使脚本每次只获取几个图像，必须记住哪些图像已经（成功）处理。例如，您可以向url表中添加一个标志列，默认情况下，标志=1，如果url处理成功，它将变为2，或者变为3，这意味着url出错。每次，脚本只能选择那些flag=1的（也可能包括3个，但有时url可能错误，因此重试无效）

复制功能太简单了，我建议改用curl，它更可靠，而且你可以获得准确的网络下载信息

代码如下：

//only fetch 50 urls each time
$queryRes = mysql_query ( "select id, url from tablName where flag=1 limit  50" );

//just prefer absolute path
$imgDirPath = dirname ( __FILE__ ) + '/';

while ( $row = mysql_fetch_object ( $queryRes ) )
{
    $info = pathinfo ( $row->url );
    $fileName = $info ['filename'];
    $fileExtension = $info ['extension'];

    //url in the table is like //www.example.com???
    $result = fetchUrl ( "http:" . $row->url, 
            $imgDirPath + "img/$fileName" . "_" . $row->id . "." . $fileExtension );

    if ($result !== true)
    {
        echo "<br/>\n unable to copy '$fileName'. Error:$result";
        //update flag to 3, finish this func yourself
        set_row_flag ( 3, $row->id );
    }
    else
    {
        //update flag to 3
        set_row_flag ( 2, $row->id );
    }
}

function fetchUrl($url, $saveto)
{
    $ch = curl_init ( $url );

    curl_setopt ( $ch, CURLOPT_FOLLOWLOCATION, true );
    curl_setopt ( $ch, CURLOPT_MAXREDIRS, 3 );
    curl_setopt ( $ch, CURLOPT_HEADER, false );
    curl_setopt ( $ch, CURLOPT_RETURNTRANSFER, true );
    curl_setopt ( $ch, CURLOPT_CONNECTTIMEOUT, 7 );
    curl_setopt ( $ch, CURLOPT_TIMEOUT, 60 );

    $raw = curl_exec ( $ch );

    $error = false;

    if (curl_errno ( $ch ))
    {
        $error = curl_error ( $ch );
    }
    else
    {
        $httpCode = curl_getinfo ( $ch, CURLINFO_HTTP_CODE );

        if ($httpCode != 200)
        {
            $error = 'HTTP code not 200: ' . $httpCode;
        }
    }

    curl_close ( $ch );

    if ($error)
    {
        return $error;
    }

    file_put_contents ( $saveto, $raw );

    return true;
}

//每次仅获取50个URL
$queryRes=mysql_query（“从tablName中选择id、url，其中flag=1 limit 50”）；
//只是喜欢绝对路径
$imgDirPath=dirname（_文件__）+'/'；
while（$row=mysql\u fetch\u object（$queryRes））
{
$info=pathinfo（$row->url）；
$fileName=$info['fileName']；
$fileExtension=$info['extension']；
//表中的url类似于//www.example.com？？？
$result=fetchUrl（“http:.”$row->url，
$imgDirPath+“img/$fileName”“。”。“$row->id”“。”$fileExtension）；
如果（$result！==true）
{
echo“
\n无法复制“$fileName”。错误：$result”；
//将标志更新为3，自己完成此功能
设置行标志（3，$row->id）；
}
其他的
{
//将标志更新为3
设置行标志（2，$row->id）；
}
}
函数fetchUrl（$url，$saveto）
{
$ch=curl\u init（$url）；
curl_setopt（$ch，CURLOPT_FOLLOWLOCATION，true）；
curl_setopt（$ch，CURLOPT_MAXREDIRS，3）；
curl_setopt（$ch，CURLOPT_头，false）；
curl_setopt（$ch，CURLOPT_RETURNTRANSFER，true）；
curl_setopt（$ch，CURLOPT_CONNECTTIMEOUT，7）；
curl_setopt（$ch，CURLOPT_超时，60）；
$raw=curl\u exec（$ch）；
$error=false；
if（旋度误差（$ch））
{
$error=curl\u error（$ch）；
}
其他的
{
$httpCode=curl\u getinfo（$ch，CURLINFO\u HTTP\u代码）；
如果（$httpCode！=200）
{
$error='HTTP代码不是200:'。$httpCode；
}
}
卷曲关闭（$ch）；
如果（$error）
{
返回$error；
}
文件内容（$saveto，$raw）；
返回true；
}

首先。。。复制不会引发任何异常。。。所以你没有做任何错误处理。。。这就是为什么您的脚本将继续运行

第二。。。你应该使用file\u get\u contets或者更好的，curl

例如，您可以尝试此功能。。。（我知道…它每次都是打开和关闭的卷曲…这就是我在这里找到的一个例子）

甚至。。尝试使用curl\u multi\u exec并并行加载文件，这样会更快

请看这里：

编辑：

要跟踪下载失败的wich文件，您需要执行以下操作

$queryRes = mysql_query("select url from tablName limit 50000"); //everytime i am using limit
while($row = mysql_fetch_object($queryRes)) {

    $info = pathinfo($row->url);    
    $fileName = $info['filename'];
    $fileExtension = $info['extension'];    

    if (!@copy("http:".$row->url, "img/$fileName"."_".$row->id.".".$fileExtension)) {
       $errors= error_get_last();
       echo "COPY ERROR: ".$errors['type'];
       echo "<br />\n".$errors['message'];
       //you can add what ever code you wnat here... out put to conselo, log in a file put an exit() to stop dowloading... 
    }
}

$queryRes=mysql\u查询（“从tablName limit 50000中选择url”）//每次我使用极限
while（$row=mysql\u fetch\u object（$queryRes））{
$info=pathinfo（$row->url）；
$fileName=$info['fileName']；
$fileExtension=$info['extension']；
如果（！@copy（“http:“.$row->url，“img/$fileName”。”。“.$row->id.”。$fileExtension））{
$errors=error_get_last（）；
回显“复制错误：”.$errors['type']；
回显“
\n”。$errors['message']；
//您可以在这里添加任何代码…输出到conselo，登录一个文件并退出（）以停止加载。。。
}
}

更多信息：

严格检查
mysql\u fetch\u对象function getimg($url) { $headers[] = 'Accept: image/gif, image/x-bitmap, image/jpeg, image/pjpeg'; $headers[] = 'Connection: Keep-Alive'; $headers[] = 'Content-type: application/x-www-form-urlencoded;charset=UTF-8'; $user_agent = 'php'; $process = curl_init($url); curl_setopt($process, CURLOPT_HTTPHEADER, $headers); curl_setopt($process, CURLOPT_HEADER, 0); curl_setopt($process, CURLOPT_USERAGENT, $useragent); curl_setopt($process, CURLOPT_TIMEOUT, 30); curl_setopt($process, CURLOPT_RETURNTRANSFER, 1); curl_setopt($process, CURLOPT_FOLLOWLOCATION, 1); $return = curl_exec($process); curl_close($process); return $return; }

$queryRes = mysql_query("select url from tablName limit 50000"); //everytime i am using limit while($row = mysql_fetch_object($queryRes)) { $info = pathinfo($row->url); $fileName = $info['filename']; $fileExtension = $info['extension']; if (!@copy("http:".$row->url, "img/$fileName"."_".$row->id.".".$fileExtension)) { $errors= error_get_last(); echo "COPY ERROR: ".$errors['type']; echo "<br />\n".$errors['message']; //you can add what ever code you wnat here... out put to conselo, log in a file put an exit() to stop dowloading... } }

$queryRes = mysql_query("SELECT id, url FROM tablName ORDER BY id"); while (($row = mysql_fetch_object($queryRes)) !== false) { $info = pathinfo($row->url); $fn = $info['filename']; if (copy( 'http:' . $row->url, "img/{$fn}_{$row->id}.{$info['extension']}" )) { echo "success: $fn\n"; } else { echo "fail: $fn\n"; } flush(); }

CREATE TABLE IF NOT EXISTS `images` ( `id` int(60) NOT NULL AUTO_INCREMENTh, `link` varchar(1024) NOT NULL, `status` enum('not fetched','fetched') NOT NULL DEFAULT 'not fetched', `timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (`id`) );

<?php // how many images to download in one go? $limit = 100; /* if set to true, the scraper reloads itself. Good for running on localhost without cron job support. Just keep the browser open and the script runs by itself ( javascript is needed) */ $reload = false; // to prevent php timeout set_time_limit(0); // db connection ( you need pdo enabled) try { $host = 'localhost'; $dbname= 'mydbname'; $user = 'root'; $pass = ''; $DBH = new PDO("mysql:host=$host;dbname=$dbname", $user, $pass); } catch(PDOException $e) { echo $e->getMessage(); } $DBH->setAttribute( PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION ); // get n number of images that are not fetched $query = $DBH->prepare("SELECT * FROM images WHERE status = 'not fetched' LIMIT {$limit}"); $query->execute(); $files = $query->fetchAll(); // if no result, don't run if(empty($files)){ echo 'All files have been fetched!!!'; die(); } // where to save the images? $savepath = dirname(__FILE__).'/scrapped/'; // fetch 'em! foreach($files as $file){ // get_url_content uses curl. Function defined later-on $content = get_url_content($file['link']); // get the file name from the url. You can use random name too. $url_parts_array = explode('/' , $file['link']); /* assuming the image url as http:// abc . com/images/myimage.png , if we explode the string by /, the last element of the exploded array would have the filename */ $filename = $url_parts_array[count($url_parts_array) - 1]; // save fetched image file_put_contents($savepath.$filename , $content); // did the image save? if(file_exists($savepath.$file['link'])) { // yes? Okay, let's save the status $query = $DBH->prepare("update images set status = 'fetched' WHERE id = ".$file['id']); // output the name of the file that just got downloaded echo $file['link']; echo '<br/>'; $query->execute(); } } // function definition get_url_content() function get_url_content($url){ // ummm let's make our bot look like human $agent= 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)'; $ch = curl_init(); curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); curl_setopt($ch, CURLOPT_VERBOSE, true); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1); curl_setopt($ch, CURLOPT_USERAGENT, $agent); curl_setopt($ch, CURLOPT_URL,$url); return curl_exec($ch); } //reload enabled? Reload! if($reload) echo '<script>location.reload(true);</script>';