Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/php/291.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Php Scraper返回空数组_Php_Curl_Xpath_Web Scraping_Domdocument - Fatal编程技术网

Php Scraper返回空数组

Php Scraper返回空数组,php,curl,xpath,web-scraping,domdocument,Php,Curl,Xpath,Web Scraping,Domdocument,我对curl和xpath有点陌生,所以仍然在学习in和out。我已经编写了一个scraper,但是当我试图通过数组显示被刮取的数据时,什么都没有显示。那么我的代码有什么问题 <?php ini_set("display_errors", "1"); error_reporting(-1); error_reporting(E_ERROR); libxml_use_internal_errors(true); //Basic Function function get_url_conte

我对curl和xpath有点陌生,所以仍然在学习in和out。我已经编写了一个scraper,但是当我试图通过数组显示被刮取的数据时,什么都没有显示。那么我的代码有什么问题

<?php

ini_set("display_errors", "1");
error_reporting(-1);
error_reporting(E_ERROR);
libxml_use_internal_errors(true);

//Basic Function
function get_url_contents($url, $timeout = 10, $userAgent = 'Mozilla/5.0(Macintosh; U; Intel Mac OS X 10_5_8; en-US)AppleWebKit/534.10 (KHTML, like Gecko) Chrome/8.0.552.215 Safari/534.10'){
    $rawhtml = curl_init();//handler
    curl_setopt($rawhtml, CURLOPT_URL,$url);//url
    curl_setopt($rawhtml, CURLOPT_RETURNTRANSFER, 1);//return result as string rahter than direct output
    curl_setopt($rawhtml, CURLOPT_CONNECTTIMEOUT,$timeout);//set timeout
    curl_setopt($rawhtml, CURLOPT_USERAGENT,$userAgent);//set user agent
    $output = curl_exec($rawhtml);//execute curl call
    curl_close($rawhtml);//close connection

    if(!$output){
        return -1;//if nothing obtained, return -1
    }
    return $output;
}

//get raw html
$html_string = get_url_contents("http://www.beursgorilla.nl/fonds-informatie.asp?naam=Aegon&cat=koersen&subcat=1&instrumentcode=955000020");//url here
//load HTML into DOM object
//ref http://www.php.net/manual/en/domdocument.loadhtml.php
//note html does not have to be well fpr,ed with this function

$dom_object = new DOMDocument();
@$dom_object->loadHTML($html_string);

//perform Xpath queries on DOM
//ref http://www.php.net/manual/en/domxpath.query.php

$xpath = new DOMXPath($dom_object);

//perform Xpath query
//use any specfic property to narrow focus

$nodes = $xpath->query("//table[@class='maintable']/tbody/tr[4]/td[2]/table[@class='koersen_tabel']/tbody/tr[2]/td[@class='koersen_tabel_midden']");

//setup some basic variables

$i = -1; //$i = counter

//when process nodes as below, cycling trough
//but not grabbing data from the header row of the table

$result = array();

//preform xpath subqueries to get numbers

foreach($nodes as $node){
    $i++;
    //using each 'node' as the limit for the new xpath to search within
    //make queries relative by starting them with a dot (e.g. ".//...")

    $details = $xpath->query("//table[3]/tbody/tr/td[1]/table[@class='fonds_info_koersen_links']/tbody/tr[1]/td[2]", $node);
    foreach($details as $detail){
        $result[$i][''] = $detail->nodeValue;
    }

    $details = $xpath->query("//table[3]/tbody/tr/td[1]/table[@class='fonds_info_koersen_links']/tbody/tr[4]/td[2]", $node);
    foreach($details as $detail){
         $result[$i][''] = $detail->nodeValue;
    }

    if(curl_errno($rawhtml)){
        echo 'Curl error: ' . curl_error($rawhtml);

        print'<pre>';   
        print_r($result);
        print '</pre>';
    }
}

?>
loadHTML($html\u字符串);
//在DOM上执行Xpath查询
//参考号http://www.php.net/manual/en/domxpath.query.php
$xpath=新的DOMXPath($dom\u对象);
//执行Xpath查询
//使用任何specfic属性来缩小焦点
$nodes=$xpath->query(//table[@class='maintable']/tbody/tr[4]/td[2]/table[@class='koersen_tabel']]/tbody/tr[2]/td[@class='koersen_tabel_midden']);
//设置一些基本变量
$i=-1//$i=计数器
//当处理如下节点时,循环槽
//但不从表的标题行获取数据
$result=array();
//预执行xpath子查询以获取数字
foreach($node作为$node){
$i++;
//使用每个“节点”作为新xpath在其中搜索的限制
//通过以点开头(例如“//…”)使查询成为相对查询
$details=$xpath->query(//table[3]/tbody/tr/td[1]/table[@class='fonds\u info\u koersen\u links']/tbody/tr[1]/td[2]”,$node);
foreach($details作为$detail){
$result[$i]['']=$detail->nodeValue;
}
$details=$xpath->query(//table[3]/tbody/tr/td[1]/table[@class='fonds\u info\u koersen\u links']/tbody/tr[4]/td[2]”,$node);
foreach($details作为$detail){
$result[$i]['']=$detail->nodeValue;
}
if(curl_errno($rawhtml)){
回显“Curl error:”.Curl_error($rawhtml);
打印“”;
$result[$i][''] = $detail->nodeValue;
打印(结果); 打印“”; } } ?>

我已经通过Chrome的元素检查器检查了xpath查询,它们似乎是正确的。我真的不知道代码出了什么问题。

这行代码呢

$result[$i][] = $detail->nodeValue;
这不应该是:


(看大括号)

我重写了我的爬虫程序,并使用了PHP简单HTML DOM解析器。这解决了我的问题,现在一切正常:)。

使用更多的
echo
查看脚本中发生了什么-打印所有变量,以及如果执行/foreach,则打印哪个