Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/php/288.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/xpath/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用XPath和PHP抓取HTML页面_Php_Xpath_Web Scraping - Fatal编程技术网

使用XPath和PHP抓取HTML页面

使用XPath和PHP抓取HTML页面,php,xpath,web-scraping,Php,Xpath,Web Scraping,我正试图用这段PHP代码抓取一个HTML页面 <?php ini_set('display_errors', 1); $url = 'http://www.cittadellasalute.to.it/index.php?option=com_content&view=article&id=6786:situazione-pazienti-in-pronto-soccorso&catid=165:pronto-soccorso&Itemid

我正试图用这段PHP代码抓取一个HTML页面

<?php
    ini_set('display_errors', 1);

    $url = 'http://www.cittadellasalute.to.it/index.php?option=com_content&view=article&id=6786:situazione-pazienti-in-pronto-soccorso&catid=165:pronto-soccorso&Itemid=372';


    //#Set CURL parameters: pay attention to the PROXY config !!!!
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
    curl_setopt($ch, CURLOPT_PROXY, '');
    $data = curl_exec($ch);
    curl_close($ch);

    $dom = new DOMDocument();
    @$dom->loadHTML($data);

    $xpath = new DOMXPath($dom);

    $greenWaitingNumber = $xpath->query('/html/body/div/div/div[4]/div[3]/section/p');


    foreach( $greenWaitingNumber as $node )
    {
      echo "Number first green line: " .$node->nodeValue;
      echo '<br>';
      echo '<br>';
    }


?>

所有工作正常(没有错误,在我的浏览器控制台中,我可以看到“200”作为返回代码…),但在我的HTML页面中没有打印任何内容

可能问题出在xpath/html/body/div/div/div[4]/div[3]/section/p上,它引用了源html页面中的第一条绿线,但这是Firefox Firebug告诉我的关于该页面的部分

建议/例子

!!!更新

正如Santosh Sapkota在回复中所建议的,第一个问题是绿色框中的文本是从iFrame加载的。。。我在IFrame广告中看到了HTML页面的url,所以我尝试在我的代码中使用这个url,现在是

<?php
    ini_set('display_errors', 1);

    $url = 'http://listeps.cittadellasalute.to.it/?id=01090101';


    //#Set CURL parameters: pay attention to the PROXY config !!!!
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
    curl_setopt($ch, CURLOPT_PROXY, '');
    $data = curl_exec($ch);
    curl_close($ch);

    $dom = new DOMDocument();
    @$dom->loadHTML($data);

    $xpath = new DOMXPath($dom);

    $greenWaitingNumber = $xpath->query('/html/body/div/div/div[4]/div[3]/section/p');


    foreach( $greenWaitingNumber as $node )
    {
      echo "Number first green line: " .$node->nodeValue;
      echo '<br>';
      echo '<br>';
    }


?>

但不幸的是,我的输出HTML页面中仍然没有打印任何内容


其他建议/示例?

肯定是您的问题。以及检查是否有来自iFrame的内容

当存在Iframe时,如何获取正确的xpath?如果您试图获取绿色框中的文本,您可以清楚地看到它是从Iframe加载的。