Php 解析b（粗体）标记之间的文本_Php_Parsing

Php 解析b（粗体）标记之间的文本

php parsing

Php 解析b（粗体）标记之间的文本,php,parsing,Php,Parsing,我正在尝试登录我的hulkshare帐户，然后返回我的帐户信息。curl登录工作正常，但我不认为我的解析工作正常，因为当我使用var_dump（$tag）时，它返回数组（0）{} 我正在尝试解析我帐户中的点，下面是位于粗体标记6302.00下载之间的表的代码 <div id="content-wrap"> <div id="wrap-in"> <br> <table> <tbody><tr><td>Usernam

我正在尝试登录我的hulkshare帐户，然后返回我的帐户信息。curl登录工作正常，但我不认为我的解析工作正常，因为当我使用var_dump（$tag）时，它返回数组（0）{}

我正在尝试解析我帐户中的点，下面是位于粗体标记6302.00下载之间的表的代码

<div id="content-wrap"> <div id="wrap-in"> <br> <table> <tbody><tr><td>Username:</td><td><b></b></td><td></td></tr> <tr><td>You have collected:</td><td><b>6302.00 downloads</b></td><td><input type="button" class="btn2" value="Convert downloads" onclick="document.location='?op=convert_points'"></td></tr> <tr><td>Used space:</td><td><b>354.0 of 200000 Mb</b></td></tr> <tr><td>My published files link:</td><td colspan="2"><a href="" target="_blank"></a></td></tr> <tr><td>My affiliate link:</td><td colspan="2"><a href=""></a><br><small>New user will get 10 downloads</small></td></tr> </tbody></table> $cookiefile = '/temp/cookies.txt'; $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, 'http://www.hulkshare.com/'); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); curl_setopt($ch, CURLOPT_COOKIEFILE, $cookiefile); curl_setopt($ch, CURLOPT_COOKIEJAR, $cookiefile); curl_setopt($ch, CURLOPT_POST, true); curl_setopt($ch, CURLOPT_POSTFIELDS, 'op=login&redirect=&login=XXXXXX&password=XXXXX'); curl_exec($ch); curl_setopt($ch, CURLOPT_URL, 'http://hulkshare.com/?op=my_account'); $contents = curl_exec($ch); curl_close($ch); //parse libxml_use_internal_errors(TRUE); $dom = new DOMDocument(); $dom->loadHTML($contents); $xml = simplexml_import_dom($dom); libxml_use_internal_errors(FALSE); $tag = $xml->xpath("/table/tbody/tr/tb/b"); var_dump($tag);

用户名：您已收集：6302.00下载已用空间：354.0，共200000 Mb 我发布的文件链接：我的会员链接：新用户将获得10次下载 $cookiefile='/temp/cookies.txt'； $ch=curl_init（）； curl_setopt（$ch，CURLOPT_URL，'http://www.hulkshare.com/'); curl_setopt（$ch，CURLOPT_RETURNTRANSFER，1）； curl_setopt（$ch，CURLOPT_FOLLOWLOCATION，true）； curl_setopt（$ch，CURLOPT_COOKIEFILE，$COOKIEFILE）； curl_setopt（$ch，CURLOPT_COOKIEJAR，$cookiefile）； curl_setopt（$ch，CURLOPT_POST，true）； curl_setopt（$ch，CURLOPT_POSTFIELDS，'op=login&redirect=&login=XXXXXX&password=XXXXX'）； curl_exec（$ch）； curl_setopt（$ch，CURLOPT_URL，'http://hulkshare.com/?op=my_account'); $contents=curl\u exec（$ch）；卷曲关闭（$ch）； //解析 libxml\u使用\u内部错误（TRUE）； $dom=新的DOMDocument（）； $dom->loadHTML（$contents）； $xml=simplexml\u import\u dom（$dom）； libxml\u使用\u内部错误（FALSE）； $tag=$xml->xpath（“/table/tbody/tr/tb/b”）； var_dump（$tag）；
如果要从一个页面中提取多个内容，建议使用phpQuery或QueryPath这样的库，这样可以简化：

foreach (qp($html)->find("b") as $b) print $b->text();
但在您的情况下，您应该直接进行文本提取。你有一个非常清晰的模式和一个不太可能改变很多的页面。使用正则表达式。HTML标记是很好的锚，但实际上，在“下载”之前，您只需查找小数：

preg#u match（'#（[\d.]+）下载#i'，$html，$match）； $downloads=$match[1]；
@jennifer注意，像QueryPath和phpQuery这样的第三方库可能更方便使用，但它们不是必需的。PHP还有本机扩展，可以使用DOM和XMLReader解析HTML。
preg_match('#<b>([\d.]+) downloads#i', $html, $match); $downloads = $match[1];