在PHP中处理脚本超时_Php - Fatal编程技术网

在PHP中处理脚本超时

php

在PHP中处理脚本超时,php,Php,我被要求编写一个脚本，解析页面中的所有href，然后访问每个href并检查每个页面是否已启动并正在运行（使用CURL调用中的HTTP代码）。我有以下几点： <?php foreach($links_array as $link_array): //$links_array are a tags $link_array=str_replace("'", "\"", $link_array); // if href='link' instead of href=

我被要求编写一个脚本，解析页面中的所有href，然后访问每个href并检查每个页面是否已启动并正在运行（使用CURL调用中的HTTP代码）。我有以下几点：

<?php foreach($links_array as $link_array): //$links_array are a tags
                $link_array=str_replace("'", "\"", $link_array); // if href='link' instead of href="link"
                $href= get_attribute($link_array, "href");
                $resolved_address= resolve_address($href, $page_base);
                $download_href= http_get($resolved_address,$ref );
                $url=$download_href['STATUS']['url'];
                $http_code=$download_href['STATUS']['http_code'];
                $total_time=$download_href['STATUS']['total_time'];
                $message=$status_code_array[$download_href['STATUS']['http_code']];
                // $status_code_array is an array 
                //if you refer to its index using the http code it give back the human
                //readable message of the code 
                ?>
                <tr>
                <td><?php echo $url ?></td>
                <td><?php echo $http_code ?></td>
                <td><?php echo $http_code ?></td>
                <td><?php echo $total_time ?></td>
                </tr>
           <?php endforeach;?>

您可以在php.ini文件中设置最大执行时间。确保使用的是正确的文件，因为可能有两个文件（一个用于fpm，一个用于cli）
您可以在此处查看您的文件：
php --ini

您还可以在脚本中设置执行时间
ini_set('max_execution_time', 300);

或者，也可以在php命令中设置时间
php -dmax_execution_time=300 script.php

要回答您的其他问题：
在这种情况下，生产软件是如何工作的
一种方法（在PHP中）是使用worker（RabbitMQ/AMQP）。这意味着您有一个脚本将消息“发送”到队列和n个工作进程中。这些工作人员从该队列中提取消息，直到该队列为空

我是否可以通过捕获致命的“超过60秒的最大执行时间”错误来继续进行CURL调用
是的，但是没有抛出异常。您可以通过以下方式实现：
if (curl_errno($ch)){
    echo 'Request Error:' . curl_error($ch);
}

对于指向损坏服务器的链接，curl超时可能需要很长时间。如果有10个断开的链接，脚本可能需要几分钟才能完成
我建议将links\u数组存储在一些数据库、xml或json文件中，并带有一个检查队列。并创建一个脚本，该脚本将检查队列中的所有链接，并将http_代码响应和其他数据存储在此数据库或xml数据中
然后您需要一个ajax脚本，该脚本将每隔X秒查询服务器，以从xml文件或数据库获取所有新的已检查链接，并将这些数据放在html页面上
您可以使用cron作业或rabbitMQ启动链接检查脚本。
使用CURLOPT\u TIMEOUT。
您的更新代码：
ini_set('max_execution_time', 0);

foreach($links_array as $link){
    $start       = microtime(true);
    $link        = get_attribute( str_replace( '\'', '"', $link ), 'href' );
    $url         = resolve_address( $link, $page_base );
    $http_code   = getHttpCode( $url );
    $total_time  = microtime(true) - $start;
    if($http_code != 0){
        echo '<tr>
                <td>' . $url . '</td>
                <td>' . $http_code . '</td>
                <td>' . $total_time . ' s. </td>
            </tr>';
    }
}

function getHttpCode( $url )
{
    $ch = curl_init($url);
    curl_setopt($ch, CURLOPT_HEADER, true);
    curl_setopt($ch, CURLOPT_NOBODY, true);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_TIMEOUT, 10);
    $output = curl_exec($ch);
    $httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
    curl_close($ch);
    return $httpcode;
}

ini\u集（'max\u execution\u time'，0）；
foreach（$links\u数组作为$link）{
$start=microtime（真）；
$link=get_属性（str_replace（“\”，$link），“href”）；
$url=解析地址（$link，$page\u base）；
$http_code=getHttpCode（$url）；
$total_time=微时间（真）-$start；
如果（$http_代码！=0）{
回声'
“.$url。”
“.$http_代码。”
“.$total_time.”s。
';
}
}
函数getHttpCode（$url）
{
$ch=curl\u init（$url）；
curl_setopt（$ch，CURLOPT_头，true）；
curl_setopt（$ch，CURLOPT_NOBODY，true）；
curl_setopt（$ch，CURLOPT_RETURNTRANSFER，1）；
curl_setopt（$ch，CURLOPT_超时，10）；
$output=curl\u exec（$ch）；
$httpcode=curl\u getinfo（$ch，CURLINFO\u HTTP\u代码）；
卷曲关闭（$ch）；
返回$httpcode；
}
为什么需要替换str_
？如果href='link'而不是href='link，则此注释“
不正确，两种引号类型都有效。从命令行运行脚本时不应有超时。如果从浏览器运行，您可以将超时设置为0，但我不希望爬虫程序执行得很快…也不依赖爬虫程序的输出。我知道这两个都是有效的，但我的get_属性函数只解析“而不是”。我需要稍后再研究这一点。我将尝试使用cmd。谢谢。它在命令行中运行得更快。它有一个超时，我将其设置为0。我也很好奇，为什么不依赖爬虫的输出呢？哦，你必须自己编写解析器吗？Domdocument已经具有该功能，并且将同时使用这两种功能