Php 用路径刮桌子
我在html页面中有一个表,看起来像(pastebin url) 我试图从表中获取内容的当前代码是:Php 用路径刮桌子,php,xpath,Php,Xpath,我在html页面中有一个表,看起来像(pastebin url) 我试图从表中获取内容的当前代码是: $html = htmlspecialchars("https://localhost/table.php"); $doc = new \DOMDocument(); if($doc->loadHTML($html)) { $result = new \DOMDocument(); $result->formatOutput = true;
$html = htmlspecialchars("https://localhost/table.php");
$doc = new \DOMDocument();
if($doc->loadHTML($html))
{
$result = new \DOMDocument();
$result->formatOutput = true;
$table = $result->appendChild($result->createElement("table"));
$thead = $table->appendChild($result->createElement("thead"));
$tbody = $table->appendChild($result->createElement("tbody"));
$xpath = new \DOMXPath($doc);
$newRow = $thead->appendChild($result->createElement("tr"));
foreach($xpath->query("//table[@id='kurstabell']/thead/tr/th[position()>1]") as $header)
{
$newRow->appendChild($result->createElement("th", trim($header->nodeValue)));
}
foreach($xpath->query("//table[@id='kurstabell']/tbody/tr") as $row)
{
$newRow = $tbody->appendChild($result->createElement("tr"));
foreach($xpath->query("./td[position()>1]", $row) as $cell)
{
$newRow->appendChild($result->createElement("td", trim($cell->nodeValue)));
}
}
echo $result->saveXML($result->documentElement);
}
print_r($result);
(我正在使用htmlspecialchars,因为libxml\u use\u internal\u errors(true);
生成错误代码Europe/Berlin]PHP警告:DOMDocument::loadHTML():htmlParseEntityRef:Entity中应为“;”,行:
,所以我在某个地方读到htmlspecialchars可以使用)
此SNIP的当前结果如下所示:
DOMDocument Object ( [doctype] => [implementation] => (object value omitted) [documentElement] => (object value omitted) [actualEncoding] => [encoding] => [xmlEncoding] => [standalone] => 1 [xmlStandalone] => 1 [version] => 1.0 [xmlVersion] => 1.0 [strictErrorChecking] => 1 [documentURI] => [config] => [formatOutput] => 1 [validateOnParse] => [resolveExternals] => [preserveWhiteSpace] => 1 [recover] => [substituteEntities] => [nodeName] => #document [nodeValue] => [nodeType] => 9 [parentNode] => [childNodes] => (object value omitted) [firstChild] => (object value omitted) [lastChild] => (object value omitted) [previousSibling] => [attributes] => [ownerDocument] => [namespaceURI] => [prefix] => [localName] => [baseURI] => [textContent] => )
php_error.log不会给我任何错误
预期的结果是相同的表,以html进行响应,但删除了所有“不必要”的代码
我的问题:
当前代码有什么问题?第一行有问题:
$html = htmlspecialchars("https://localhost/table.php");
它应该是:
$html = file_get_contents("https://localhost/table.php");
该函数将转义所有HTML标记,当使用
loadHTML()
进行分析时,这些标记将返回单个文本节点,而不是预期的DOM。问题在于第一行:
$html = htmlspecialchars("https://localhost/table.php");
它应该是:
$html = file_get_contents("https://localhost/table.php");
该函数将转义所有HTML标记,当通过loadHTML()
解析这些标记时,它们将返回单个文本节点,而不是预期的DOM