Php 如何优化此代码以提取标题_Php

Php 如何优化此代码以提取标题

php

Php 如何优化此代码以提取标题,php,Php,下面是我用来提取任何网站标题的示例代码： function fread_url($url,$ref="") { if(function_exists("curl_init")){ $ch = curl_init(); $user_agent = "googlebot"; $ch = curl_init(); curl_setopt($ch, CURLOPT_USERAGENT,

下面是我用来提取任何网站标题的示例代码：

function fread_url($url,$ref="")
    {
        if(function_exists("curl_init")){
            $ch = curl_init();
            $user_agent = "googlebot";
            $ch = curl_init();
            curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
            curl_setopt( $ch, CURLOPT_HTTPGET, 1 );
            curl_setopt( $ch, CURLOPT_RETURNTRANSFER, 1 );
            curl_setopt( $ch, CURLOPT_FOLLOWLOCATION , 1 );
            curl_setopt( $ch, CURLOPT_URL, $url );
            curl_setopt ($ch, CURLOPT_COOKIEJAR, 'cookie.txt');
            curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
            $html = curl_exec($ch);
            curl_close($ch);
        }
        else{
            $html.= file_get_contents($urweb);
               }
        return $html;
    }
////////////////////////////////////
$doc = new DOMDocument(); @$doc->loadHTML(@fread_url($urweb));  
$titlelist = $doc->getElementsByTagName("title"); 
if($titlelist->length > 0){   $wbtitle = $titlelist->item(0)->nodeValue; } 
echo $wbtitle;

我的问题是，如何修改此脚本以访问网站5秒钟，如果没有可用的标题，则返回ampty？？现在，对于一些网站来说，提取标题需要5秒钟以上，甚至更长时间。

为cURL设置一个超时

curl_setopt($ch, CURLOPT_TIMEOUT, 5);

看起来您已经在尝试使用

CURLOPT_CONNECTTIMEOUT

来实现这一点，但这是错误的

尝试连接时等待的秒数

而

CURLOPT_TIMEOUT

TIMEOUT是

允许cURL函数执行的最大秒数

您可以完全按照以下方式重写函数。如果需要保留fread_url（）函数，还可以生成另一个函数

function get_page_title($url, $ref = "") {
    if (function_exists("curl_init")) {
        $ch = curl_init();
        $user_agent = "googlebot";
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
        curl_setopt( $ch, CURLOPT_HTTPGET, 1 );
        curl_setopt( $ch, CURLOPT_RETURNTRANSFER, 1 );
        curl_setopt( $ch, CURLOPT_FOLLOWLOCATION , 1 );
        curl_setopt( $ch, CURLOPT_URL, $url );
        curl_setopt ($ch, CURLOPT_COOKIEJAR, 'cookie.txt');
        curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
        $html = curl_exec($ch);
        curl_close($ch);
    } else {
        $html = file_get_contents($urweb);
    }

    if ($html === false || empty($html))
        return false;

    $doc = new DOMDocument();
    @$doc->loadHTML($html);  
    $titlelist = $doc->getElementsByTagName("title"); 

    return $titlelist->length > 0 ? $titlelist->item(0)->nodeValue : '';
}

$wbtitle = get_page_title($urlweb);

$user\u agent=“googlebot”？非常努力，谢谢。但是$html.=
应该是$html=
并且如果（$html===false | | empty（$html））
应该是：如果（！$html）返回false
您还应该在函数的开头粘贴$html的默认值$html=FALSE
。。。。哦，你应该使用CURLOPT_TIMEOUT。@SkippyChalmers实际上，$html
总是会被设置的，所以不应该这样。如果（empty（$html））返回false，则至少应为否。绝对不可重复。空（）。但是，如果未使用串联赋值“=”（或其他任何形式），则始终会设置$html。是的，我知道不管怎样，它仍然会使你成功，但你做错了。不管怎样，empty（）本质上是检查假值，或者至少$html变量中被认为是假的任何内容对OP没有任何用处，应该被视为失败。如果curl或file_get_内容不返回任何内容，empty将抛出false，这是错的吗？html设置有或没有=
，我不明白为什么这是错误的，没有冒犯的意思，但我认为你是在对此发表意见。=是一个打字错误。嘘，别这样。解释一下：if（$html===false | | empty（$html））应该是：if（！$html）return false；。我的解决方案运行良好，并在两个级别上进行验证。