Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/php/294.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Php 无法获取跨多个页面的内容_Php_Curl_Web Scraping_Simple Html Dom - Fatal编程技术网

Php 无法获取跨多个页面的内容

Php 无法获取跨多个页面的内容,php,curl,web-scraping,simple-html-dom,Php,Curl,Web Scraping,Simple Html Dom,我用php编写了一个脚本,可以从网页中删除标题及其链接。该网页显示它的内容穿越多个页面。我下面的脚本可以解析它的登录页上的标题和链接 如何更正现有脚本以从多个页面(最多10个页面)获取数据 这是我迄今为止的尝试: <?php include "simple_html_dom.php"; $link = "https://stackoverflow.com/questions/tagged/web-scraping?page=2"; function get_content($url) {

我用php编写了一个脚本,可以从网页中删除标题及其链接。该网页显示它的内容穿越多个页面。我下面的脚本可以解析它的登录页上的标题和链接

如何更正现有脚本以从多个页面(最多10个页面)获取数据

这是我迄今为止的尝试:

<?php
include "simple_html_dom.php";
$link = "https://stackoverflow.com/questions/tagged/web-scraping?page=2";
function get_content($url)
{
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    $htmlContent = curl_exec($ch);
    curl_close($ch);
    $dom = new simple_html_dom();
    $dom->load($htmlContent);
    foreach($dom->find('.question-summary') as $file){
        $itemTitle = $file->find('.question-hyperlink', 0)->innertext;
        $itemLink = $file->find('.question-hyperlink', 0)->href;
        echo "{$itemTitle},{$itemLink}<br>";
    }
}
get_content($link);
?>

该网站增加了页面,如?page=2、?page=3 e.t.c.

以下是我将如何使用:


这就是我成功应对尼玛建议的原因

<?php
include "simple_html_dom.php";
$link = "https://stackoverflow.com/questions/tagged/web-scraping?page="; 

function get_content($url)
    {
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
        $htmlContent = curl_exec($ch);
        curl_close($ch);
        $dom = new simple_html_dom();
        $dom->load($htmlContent);
        foreach($dom->find('.question-summary') as $file){
            $itemTitle = $file->find('.question-hyperlink', 0)->innertext;
            $itemLink = $file->find('.question-hyperlink', 0)->href;
            echo "{$itemTitle},{$itemLink}<br>";
        }
    }
for($i = 1; $i<10; $i++){
        get_content($link.$i);
    }
?>

你考虑使用循环吗?作为一个非常基本的示例,类似$link=https://...?page=; 对于$i=0$我
<?php
include "simple_html_dom.php";
$link = "https://stackoverflow.com/questions/tagged/web-scraping?page="; 

function get_content($url)
    {
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
        $htmlContent = curl_exec($ch);
        curl_close($ch);
        $dom = new simple_html_dom();
        $dom->load($htmlContent);
        foreach($dom->find('.question-summary') as $file){
            $itemTitle = $file->find('.question-hyperlink', 0)->innertext;
            $itemLink = $file->find('.question-hyperlink', 0)->href;
            echo "{$itemTitle},{$itemLink}<br>";
        }
    }
for($i = 1; $i<10; $i++){
        get_content($link.$i);
    }
?>