Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/php/282.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/bash/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Php 如何使用分页刮取网页_Php_Web Scraping_Simple Html Dom - Fatal编程技术网

Php 如何使用分页刮取网页

Php 如何使用分页刮取网页,php,web-scraping,simple-html-dom,Php,Web Scraping,Simple Html Dom,我正在安装一个新服务器,想从网站上获取一些信息 这是我的代码,我试图一页一页地刮,但我只得到了2页 $result = array(); function scrapingAnimelist($url, $page) { $res = array(); $urlParsed = $url . "&page=" . $page; $html = file_get_html($urlParsed); $pageData = array(); for

我正在安装一个新服务器,想从网站上获取一些信息

这是我的代码,我试图一页一页地刮,但我只得到了2页

$result = array();
function scrapingAnimelist($url, $page)
{

    $res = array();
    $urlParsed = $url . "&page=" . $page;
    $html = file_get_html($urlParsed);

    $pageData = array();
    foreach ($html->find('div[class=body]') as $item) {
        $metaData = array();
        $metaData['title'] = $item->find('h2[class=title]', 0)->innertext;
        $metaData['img'] = $item->find('img[class=img]', 0)->src;
        $metaData['url'] = $item->find('a', 0)->href;
        array_push($pageData, $metaData);
    }

    $res[$page] = $pageData;

    if (sizeof($pageData) == 20) {
        $page++;
        $res[$page] = scrapingAnimelist($url, $page);
    }
    global $result;
    $result = $res;


    return $pageData;

}

我希望链接中只有2个数组(页面数据)的json对象的输出为3:

您的
$result
在第二次运行时未设置

你应该这样做

$result=array();
函数scrapingAnimelist($url,$page){
全球$结果;
$URLPASSED=$url.&page=“.$page;
$html=file\u get\u html($urlParsed);
$pageData=array();
foreach($html->find('div[class=body]”)作为$item){
$metaData=array();
$metaData['title']=$item->find('h2[class=title]',0)->innertext;
$metaData['img']=$item->find('img[class=img]',0)->src;
$metaData['url']=$item->find('a',0)->href;
数组推送($pageData,$metaData);
}
$result[$page]=$pageData;
if(sizeof($pageData)==20){
返回scrapingAnimelist($url,$page+1);
}
返回$result;
}