Php 使用XPath提取节点值_Php_Xpath_Html Parsing

Php 使用XPath提取节点值

php xpath

Php 使用XPath提取节点值,php,xpath,html-parsing,Php,Xpath,Html Parsing,amazon.com上有一个部分，我想从中提取每个项目的数据（仅节点值，而不是链接）我要查找的值在内部和 <ul data-typeid="n" id="ref_1000"> <li style="margin-left: -18px"> <a href="/s/ref=sr_ex_n_0?rh=i%3Aaps%2Ck%3Ahow+to+grow+tomatoes&sort=salesrank&keyword

amazon.com上有一个部分，我想从中提取每个项目的数据（仅节点值，而不是链接）

我要查找的值在内部和

<ul data-typeid="n" id="ref_1000">
    <li style="margin-left: -18px">
        <a href="/s/ref=sr_ex_n_0?rh=i%3Aaps%2Ck%3Ahow+to+grow+tomatoes&amp;sort=salesrank&amp;keywords=how+to+grow+tomatoes&amp;ie=UTF8&amp;qid=1327603358">
            <span class="expand">Any Department</span>
        </a>
    </li>
    <li style="margin-left: 8px">
        <strong>Books</strong>
    </li>
    <li style="margin-left: 6px">
        <a href="/s/ref=sr_nr_n_0?rh=k%3Ahow+to+grow+tomatoes%2Cn%3A283155%2Cp_n_feature_browse-bin%3A618073011%2Cn%3A%211000%2Cn%3A48&amp;bbn=1000&amp;sort=salesrank&amp;keywords=how+to+grow+tomatoes&amp;ie=UTF8&amp;qid=1327603358&amp;rnid=1000">
            <span class="refinementLink">Crafts, Hobbies & Home</span><span class="narrowValue">(19)</span>
        </a>
    </li>
    <li style="margin-left: 6px">
       <a href="/s/ref=sr_nr_n_1?rh=k%3Ahow+to+grow+tomatoes%2Cn%3A283155%2Cp_n_feature_browse-bin%3A618073011%2Cn%3A%211000%2Cn%3A10&amp;bbn=1000&amp;sort=salesrank&amp;keywords=how+to+grow+tomatoes&amp;ie=UTF8&amp;qid=1327603358&amp;rnid=1000">
            <span class="refinementLink">Health, Fitness & Dieting</span><span class="narrowValue">(3)</span>
        </a>
    </li>
    <li style="margin-left: 6px">
        <a href="/s/ref=sr_nr_n_2?rh=k%3Ahow+to+grow+tomatoes%2Cn%3A283155%2Cp_n_feature_browse-bin%3A618073011%2Cn%3A%211000%2Cn%3A6&amp;bbn=1000&amp;sort=salesrank&amp;keywords=how+to+grow+tomatoes&amp;ie=UTF8&amp;qid=1327603358&amp;rnid=1000">
            <span class="refinementLink">Cookbooks, Food & Wine</span><span class="narrowValue">(2)</span>
        </a>
    </li>
</ul>

以下表达式应该有效：

//*[@id='ref_1000']/li/a/span[@class='narrowValue']

为了获得更好的性能，您可以提供指向此表达式开头的直接路径，但提供的路径更灵活（因为您可能需要此路径跨多个页面工作）

还请记住，您的HTML解析器可能会生成与Firebug（我测试的地方）生成的结果树不同的结果树。这里有一个更灵活的解决方案：

//*[@id='ref_1000']//span[@class='narrowValue']

灵活性带来了潜在的性能（和准确性）成本，但在处理标签汤时，它通常是唯一的选择。

如果需要绘制类别名称：

// Suppress invalid markup warnings
libxml_use_internal_errors(true);

// Create SimpleXML object
$doc = new DOMDocument();
$doc->strictErrorChecking = false;
$doc->loadHTML($html); // $html - string fetched by CURL 
$xml = simplexml_import_dom($doc);

// Find a category nodes
$categories = $xml->xpath("//span[@class='refinementLink']");

编辑使用DOMDocument

$doc=newDOMDocument（）；
$doc->strigerrorchecking=false；
$doc->loadHTML（$html）；
$xpath=新的DOMXPath（$doc）；
//选择父节点
$categories=$xpath->query（//span[@class='refinementLink']/..）；
foreach（$categories作为$category）{
回声'；
echo$category->childNodes->item（1）->firstChild->nodeValue；
echo$category->childNodes->item（2）->firstChild->nodeValue；
回声'；
//手工艺、爱好和家庭（19）
}

我强烈建议您签出。它本质上是用于PHP的jQuery选择器引擎，因此要获取所需的文本，可以执行以下操作：

Crafts, Hobbies & Home
Health, Fitness & Dieting
Cookbooks, Food & Wine

这应该输出如下内容：

Crafts, Hobbies & Home
Health, Fitness & Dieting
Cookbooks, Food & Wine

这是迄今为止我所知道的PHP中最简单的屏幕抓取、DOM解析方法。

你用什么来解析HTML？我用curl来抓取页面，而不是用domdocument和xpath来选择数据。.你说你想要

窄值

，但是你的代码使用

精炼链接

。是哪一个？无论哪种方式，我下面的解决方案都应该有效。只需替换所需的类名即可。我希望refinementLink和narrawvalues仍然显示空数组，尽管更新了您需要的xpath查询provided@NewBee-您确定数组是空的吗？或者可能是您不应该使用

nodeValue

，而是使用类似

textContent

或

text

的内容？我不使用PHP，所以我不确定，但您需要确切地验证错误在哪里。如果我对类值使用双引号（例如“refinementLinke”），我会得到一个空数组，没有结果，如果我像您那样使用单qoute，我会得到unexpectt\t字符串error@NewBee-我不懂PHP，但这看起来很可疑：

$rank[]=（trim（$word->nodeValue））

你真的想分配给

$rank[]

还是应该在这些括号中有一些索引？我的文件中有$arry[]的相同代码，所以我认为这不是问题，谢谢，我可以在没有xml对象的情况下使用它吗？目前我正在使用domdocument对象，然后对其应用查询..使用您提供的xpath，我得到了所有带有span class refinementlink的节点，我想要得到的是，只需取出第一个节点，它就取消了..注意：试图在第217行的D:\wamp\www\edwin\scrasting.php中获取非对象的属性，使用您的代码获取此错误…第217行保持echo$category->childNodes->item（1）->firstChild->nodeValue；您的XML结构可能有点不同。但是XPath表达式应该可以。

foreach (pq('span.refinementLink') as $p) {
  print $p->text() . "\n";
}

Crafts, Hobbies & Home
Health, Fitness & Dieting
Cookbooks, Food & Wine