Php 简单HTML DOM-遍历HTML_Php_Dom_Simple Html Dom_Tree Traversal

Php 简单HTML DOM-遍历HTML

php dom

Php 简单HTML DOM-遍历HTML,php,dom,simple-html-dom,tree-traversal,Php,Dom,Simple Html Dom,Tree Traversal,我正在使用简单的HTML DOM解析器- 我正试着从记分板上抓取一些数据。下面的示例显示如何提取“”表的HTML 在$tr->find（'td'，0）的第一列中，有一个超链接。如何提取此超链接？使用$tr->find（'td'，0'）->find（'a'）似乎不起作用另外：我可以为每个表编写条件（传递、匆忙、接收等），但是有更有效的方法吗？我对这一点持开放态度 include('simple_html_dom.php'); $html = file_get_html('http://espn.

我正在使用简单的HTML DOM解析器- 我正试着从记分板上抓取一些数据。下面的示例显示如何提取“”表的HTML

在

$tr->find（'td'，0）

的第一列中，有一个超链接。如何提取此超链接？使用

$tr->find（'td'，0'）->find（'a'）

似乎不起作用

另外：我可以为每个表编写条件（传递、匆忙、接收等），但是有更有效的方法吗？我对这一点持开放态度

include('simple_html_dom.php');
$html = file_get_html('http://espn.go.com/ncf/boxscore?gameId=322432006');

$teamA['rushing'] = $html->find('table.mod-data',5);

foreach ($teamA as $type=>$data) {
  switch ($type) {
    # Rushing Table
    case "rushing":
       foreach ($data->find('tr') as $tr) {
        echo $tr->find('td', 0);    // First TD column (Player Name)
        echo $tr->find('td', 1);    // Second TD Column (Carries)
        echo $tr->find('td', 2);    // Third TD Column (Yards)
        echo $tr->find('td', 3);    // Fourth TD Column (AVG)
        echo $tr->find('td', 4);    // Fifth TD Column (TDs)
        echo $tr->find('td', 5);    // Sixth TD Column (LGs)
        echo "<hr />";
        }
   }
}

include（'simple_html_dom.php'）；
$html=file\u get\u html（'http://espn.go.com/ncf/boxscore?gameId=322432006');
$teamA['rushing']=$html->find（'table.mod data'，5）；
foreach（$teamA作为$type=>$data）{
交换机（$类型）{
#冲床
“匆忙”一案：
foreach（$data->find（'tr'）作为$tr）{
echo$tr->find（'td'，0）；//第一个td列（播放器名称）
echo$tr->find（'td'，1）；//第二个td列（进位）
echo$tr->find（'td'，2）；//第三个td列（码）
echo$tr->find（'td'，3）；//第四个td列（平均值）
echo$tr->find（'td'，4）；//第五个td列（TDs）
echo$tr->find（'td'，5）；//第六个td列（LGs）
回声“”；
}
}
}

根据文档，您应该能够为嵌套元素链接选择器

这是他们给出的示例：

// Find first <li> in first <ul>    
$e = $html->find('ul', 0)->find('li', 0);

//在first中查找first
$e=$html->find（'ul'，0）->find（'li'，0）；

我能看到的唯一区别是，它们在第二次查找中包含索引。尝试在中添加它，看看它是否适合您。

在您的示例中，

find（'tr'）

返回10个元素，而不是只返回7行

此外，并非所有名称都有与之关联的链接，当链接不存在时尝试检索该链接可能会返回错误

因此，下面是代码的修改工作版本：

$url = 'http://espn.go.com/ncf/boxscore?gameId=322432006';

$html = file_get_html('http://espn.go.com/ncf/boxscore?gameId=322432006');

$teamA['rushing'] = $html->find('table.mod-data',5);

foreach ($teamA as $type=>$data) {
  switch ($type) {
    # Rushing Table
    case "rushing":
        echo count($data->find('tr')) . " \$tr found !<br />";

        foreach ($data->find('tr') as $key => $tr) {

            $td = $tr->find('td');

            if (isset($td[0])) {
                echo "<br />";
                echo $td[0]->plaintext . " | ";         // First TD column (Player Name)

                // If anchor exists
                if($anchor = $td[0]->find('a', 0))
                    echo $anchor->href;                 // href

                echo " | ";

                echo $td[1]->plaintext . " | ";     // Second TD Column (Carries)
                echo $td[2]->plaintext . " | ";     // Third TD Column (Yards)
                echo $td[3]->plaintext . " | ";     // Fourth TD Column (AVG)
                echo $td[4]->plaintext . " | ";     // Fifth TD Column (TDs)
                echo $td[5]->plaintext;             // Sixth TD Column (LGs)
                echo "<hr />";
            }

        }
   }
}

PHP Simple HTML DOM Parser已知php5存在内存泄漏问题，因此不要忘记在不再使用DOM对象时释放内存：

$html = file_get_html(...);

// do something... 

$html->clear(); 
unset($html);

Source: http://simplehtmldom.sourceforge.net/manual_faq.htm#memory_leak

$html = file_get_html(...);

// do something... 

$html->clear(); 
unset($html);

Source: http://simplehtmldom.sourceforge.net/manual_faq.htm#memory_leak