使用php构建页面解析器。想使用jquery/ajax吗_Php_Jquery_Ajax

使用php构建页面解析器。想使用jquery/ajax吗

php jquery ajax

使用php构建页面解析器。想使用jquery/ajax吗,php,jquery,ajax,Php,Jquery,Ajax,好了，伙计们！我一直在寻找，我在寻找解决问题的方法时遇到了一些困难。在此之前，我为我糟糕的英语道歉我正在为一个特定的新闻站点的新闻文章构建一个小型解析器。我希望代码也准备好添加其他新闻页面，这就是为什么它是这样的我希望页面在不刷新页面的情况下重新加载其内容。我知道从所选URL检索内容需要一些时间。这就是为什么我也想从jqueryui中添加一个progressbar（我知道需要它的分配）。progressbar是可选的我还使用了简单的HTMLDOM解析器 <?php //Page lo

好了，伙计们！我一直在寻找，我在寻找解决问题的方法时遇到了一些困难。在此之前，我为我糟糕的英语道歉

我正在为一个特定的新闻站点的新闻文章构建一个小型解析器。我希望代码也准备好添加其他新闻页面，这就是为什么它是这样的

我希望页面在不刷新页面的情况下重新加载其内容。我知道从所选URL检索内容需要一些时间。这就是为什么我也想从jqueryui中添加一个progressbar（我知道需要它的分配）。progressbar是可选的

我还使用了简单的HTMLDOM解析器

<?php
//Page load time
$starttime = explode(' ', microtime());
$starttime = $starttime[1] + $starttime[0];

?>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>svd parser</title>
<link rel="shortcut icon" href="favicon.ico" type="image/x-icon"/>
<link rel="stylesheet" type="text/css" href="style.css"/>
<script type="text/javascript" src="jquery-1.10.2.min.js"></script>
</head>
<body>
    <div class="container">
    <div id="head">
    <h1>svd parser</h1>
    <hr>
    <form action="index.php" method="post">
    <input type="text" name="s" placeholder="enter a URL to start the svd parser" style="width: 495px;">
    <input type="submit" value="svd parser it">
    </form>


<?php

if (isset($_POST["s"]) && trim($_POST["s"]) !="") {

//what is the domain?
preg_match('@^(?:http://)?([^/]+)@i',$_POST["s"], $matches);
$host = $matches[1];
// get last two segments of host name
preg_match('/[^.]+\.[^.]+$/', $host, $matches);
echo "<b>domain name is: {$matches[0]}.</b><br>\n";


 function checkDomainGetRightValues($domain) {
    if ($domain == "svd.se") {
        $h1="h1";
        $page="p[class=preamble], div[class=articletext]";
        return array('h1'=> $h1,'searchparse' => $page);
    }else {
        return null;
    }
}


include('simple_html_dom.php');
$html = new simple_html_dom();
$ids=checkDomainGetRightValues($matches[0]);

//Get the page
$html = file_get_html($_POST['s']);

// Find all h1 
$ret = $html->find($ids['h1']);

//Strip the h1 of all html tags (a href) add h1 tags
echo "<h1>" . strip_tags($ret[0]) . "</h1>";

//find the actual article and forget about everything else
//Function for extraction right parse lines
//$values= checkDomainGetRightValues($matches[0]);
$ret = $html->find($ids['searchparse']);

//prints article with OUT HTML tags, but with <p> so you can read it
//Print the first part of article so you get a hint what it is all about
echo "<p><b>". strip_tags($ret[0]) ."</b></p>";


//Here is the actuall article
$a=html_entity_decode($ret[1]);
echo strip_tags($a, '<p>');
$html->clear(); 
unset($html);



}else{
    echo "You need to write the whole article URL<br>";
}


//Page load time
$mtime = explode(' ', microtime());
$totaltime = $mtime[0] + $mtime[1] - $starttime;
printf('Page loaded in %.3f seconds.', $totaltime);

?>

</div>
<div id="sidebar">
<b>SvD
</div>

</div>
</body>
</html>


奇异值分解分析器
奇异值分解分析器

SvD

如果有人能至少为我指出正确的方向，我将不胜感激

不要在没有许可的情况下进行刮取。这不会是一个问题。不要对每个请求进行刮取，而是定期进行刮取，并将结果存储在本地，这样使用起来会更快、更少