Php 使用简单的HTMLDOM解析器从HTML中提取数据_Php_Parsing_Simple Html Dom

Php 使用简单的HTMLDOM解析器从HTML中提取数据

php parsing

Php 使用简单的HTMLDOM解析器从HTML中提取数据,php,parsing,simple-html-dom,Php,Parsing,Simple Html Dom,对于一个大学项目，我正在创建一个带有一些后端算法的网站，为了在演示环境中测试这些算法，我需要大量假数据。为了获得这些数据，我打算搜集一些网站。其中一个网站是freeloper.com。为了提取数据，我正在使用简单的HTMLDOM解析器，但到目前为止，我在实际获取所需数据的努力中一直没有成功下面是一个我打算略过的页面的HTML布局示例。红色框标记所需的数据以下是我在学习了一些教程之后编写的代码 <?php include "simple_html_dom.php"; // Create

对于一个大学项目，我正在创建一个带有一些后端算法的网站，为了在演示环境中测试这些算法，我需要大量假数据。为了获得这些数据，我打算搜集一些网站。其中一个网站是freeloper.com。为了提取数据，我正在使用简单的HTMLDOM解析器，但到目前为止，我在实际获取所需数据的努力中一直没有成功

下面是一个我打算略过的页面的HTML布局示例。红色框标记所需的数据

以下是我在学习了一些教程之后编写的代码

<?php
include "simple_html_dom.php";
// Create DOM from URL
$html = file_get_html('http://www.freelancer.com/jobs/Website-Design/1/');

//Get all data inside the <tr> of <table id="project_table">
foreach($html->find('table[id=project_table] tr') as $tr) {

    foreach($tr->find('td[class=title-col]') as $t) {
        //get the inner HTML
        $data = $t->outertext;
        echo $data;
    }
}

?>

希望有人能给我指出正确的方向，告诉我如何才能让这一切顺利进行

谢谢。

原始源代码不同，这就是为什么您没有得到预期的结果

您可以使用

ctrl+u

检查原始源代码，数据在

表[id=project\u table\u static]

中，单元格

td

没有属性，因此，下面是一个从表中获取所有URL的工作代码：

$url = 'http://www.freelancer.com/jobs/Website-Design/1/';
// Create DOM from URL
$html = file_get_html($url);

//Get all data inside the <tr> of <table id="project_table">
foreach($html->find('table#project_table_static tbody tr') as $i=>$tr) {

    // Skip the first empty element
    if ($i==0) {
        continue;
    }

    echo "<br/>\$i=".$i;

    // get the first anchor
    $anchor = $tr->find('a', 0);
    echo " => ".$anchor->href;
}

// Clear dom object
$html->clear(); 
unset($html);

$url='1！'http://www.freelancer.com/jobs/Website-Design/1/';
//从URL创建DOM
$html=file\u get\u html（$url）；
//获取的内部的所有数据
foreach（$html->find（'table#project _table _statictbody tr'）作为$i=>$tr）{
//跳过第一个空元素
如果（$i==0）{
继续；
}
echo“
\$i=”.$i；
//第一个锚
$anchor=$tr->find（'a'，0）；
echo“=>”$anchor->href；
}
//清除dom对象
$html->clear（）；
未结算（$html）；

原始源代码是不同的，这就是为什么您没有得到预期的结果

您可以使用

ctrl+u

检查原始源代码，数据在

表[id=project\u table\u static]

中，单元格

td

没有属性，因此，下面是一个从表中获取所有URL的工作代码：

$url = 'http://www.freelancer.com/jobs/Website-Design/1/';
// Create DOM from URL
$html = file_get_html($url);

//Get all data inside the <tr> of <table id="project_table">
foreach($html->find('table#project_table_static tbody tr') as $i=>$tr) {

    // Skip the first empty element
    if ($i==0) {
        continue;
    }

    echo "<br/>\$i=".$i;

    // get the first anchor
    $anchor = $tr->find('a', 0);
    echo " => ".$anchor->href;
}

// Clear dom object
$html->clear(); 
unset($html);

$url='1！'http://www.freelancer.com/jobs/Website-Design/1/';
//从URL创建DOM
$html=file\u get\u html（$url）；
//获取的内部的所有数据
foreach（$html->find（'table#project _table _statictbody tr'）作为$i=>$tr）{
//跳过第一个空元素
如果（$i==0）{
继续；
}
echo“
\$i=”.$i；
//第一个锚
$anchor=$tr->find（'a'，0）；
echo“=>”$anchor->href；
}
//清除dom对象
$html->clear（）；
未结算（$html）；

检查原始源代码

ctrl+u

，数据位于

表[id=project\u table\u static]

project\u table\u static不起作用。检查原始源代码

ctrl+u

，数据位于

表[id=project\u table\u static]

project\u table\u static不起作用。