Php 简单HTMLDOM解析器-获取所有明文,而不是特定元素的文本

Php 简单HTMLDOM解析器-获取所有明文,而不是特定元素的文本,php,parsing,dom,html-parsing,web-scraping,Php,Parsing,Dom,Html Parsing,Web Scraping,我尝试了所有贴在上面的解决方案。虽然这和我的问题很相似,但它的解决方案对我来说并不适用 我正在尝试获取的纯文本位于外部,并且应该位于查询内部: 说明: // - selects nodes regardless of their position in tree div - selects elements which node name is 'div' [@id="maindiv"] - selects only those div

我尝试了所有贴在上面的解决方案。虽然这和我的问题很相似,但它的解决方案对我来说并不适用

我正在尝试获取的纯文本位于外部,并且应该位于查询内部:

说明:

//               - selects nodes regardless of their position in tree

div              - selects elements which node name is 'div'

[@id="maindiv"]  - selects only those divs having the attribute id="maindiv"

/                - sets focus to the div element

text()           - selects only text elements

[2]              - selects the second text element (the first is whitespace)

                   Note! The actual position of the text element may depend on
                   your preserveWhitespace setting.

                   Manual: http://www.php.net/manual/de/class.domdocument.php#domdocument.props.preservewhitespace
例如:

$html = <<<EOF
<div id="maindiv">
     <b>I dont want this text</b>
     I want this text
</div>
EOF;

$doc = new DOMDocument();
$doc->loadHTML($html);

$selector = new DOMXpath($doc);   

$node = $selector->query('//div[@id="maindiv"]/text()[2]')->item(0);
echo trim($node->nodeValue); // I want this text
删除第一个:


谢谢你的快速回复,你能给我解释一下吗。
I don't want this text I want this text
$selector->query('//div[@id="maindiv"]/text()[2]')
//               - selects nodes regardless of their position in tree

div              - selects elements which node name is 'div'

[@id="maindiv"]  - selects only those divs having the attribute id="maindiv"

/                - sets focus to the div element

text()           - selects only text elements

[2]              - selects the second text element (the first is whitespace)

                   Note! The actual position of the text element may depend on
                   your preserveWhitespace setting.

                   Manual: http://www.php.net/manual/de/class.domdocument.php#domdocument.props.preservewhitespace
$html = <<<EOF
<div id="maindiv">
     <b>I dont want this text</b>
     I want this text
</div>
EOF;

$doc = new DOMDocument();
$doc->loadHTML($html);

$selector = new DOMXpath($doc);   

$node = $selector->query('//div[@id="maindiv"]/text()[2]')->item(0);
echo trim($node->nodeValue); // I want this text
$part->find('b', 0)->outertext = '';
echo $part->innertext; // I want this text