Php 无法获取跨多个页面的内容_Php_Curl_Web Scraping_Simple Html Dom

Php 无法获取跨多个页面的内容

php curl web-scraping

Php 无法获取跨多个页面的内容,php,curl,web-scraping,simple-html-dom,Php,Curl,Web Scraping,Simple Html Dom,我用php编写了一个脚本，可以从网页中删除标题及其链接。该网页显示它的内容穿越多个页面。我下面的脚本可以解析它的登录页上的标题和链接如何更正现有脚本以从多个页面（最多10个页面）获取数据这是我迄今为止的尝试： <?php include "simple_html_dom.php"; $link = "https://stackoverflow.com/questions/tagged/web-scraping?page=2"; function get_content($url) {

我用php编写了一个脚本，可以从网页中删除标题及其链接。该网页显示它的内容穿越多个页面。我下面的脚本可以解析它的登录页上的标题和链接

如何更正现有脚本以从多个页面（最多10个页面）获取数据

这是我迄今为止的尝试：

<?php
include "simple_html_dom.php";
$link = "https://stackoverflow.com/questions/tagged/web-scraping?page=2";
function get_content($url)
{
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    $htmlContent = curl_exec($ch);
    curl_close($ch);
    $dom = new simple_html_dom();
    $dom->load($htmlContent);
    foreach($dom->find('.question-summary') as $file){
        $itemTitle = $file->find('.question-hyperlink', 0)->innertext;
        $itemLink = $file->find('.question-hyperlink', 0)->href;
        echo "{$itemTitle},{$itemLink}<br>";
    }
}
get_content($link);
?>

该网站增加了页面，如？page=2、？page=3 e.t.c.

以下是我将如何使用：

这就是我成功应对尼玛建议的原因

<?php
include "simple_html_dom.php";
$link = "https://stackoverflow.com/questions/tagged/web-scraping?page="; 

function get_content($url)
    {
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
        $htmlContent = curl_exec($ch);
        curl_close($ch);
        $dom = new simple_html_dom();
        $dom->load($htmlContent);
        foreach($dom->find('.question-summary') as $file){
            $itemTitle = $file->find('.question-hyperlink', 0)->innertext;
            $itemLink = $file->find('.question-hyperlink', 0)->href;
            echo "{$itemTitle},{$itemLink}<br>";
        }
    }
for($i = 1; $i<10; $i++){
        get_content($link.$i);
    }
?>

你考虑使用循环吗？作为一个非常基本的示例，类似$link=https://...?page=; 对于$i=0$我

<?php
include "simple_html_dom.php";
$link = "https://stackoverflow.com/questions/tagged/web-scraping?page="; 

function get_content($url)
    {
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
        $htmlContent = curl_exec($ch);
        curl_close($ch);
        $dom = new simple_html_dom();
        $dom->load($htmlContent);
        foreach($dom->find('.question-summary') as $file){
            $itemTitle = $file->find('.question-hyperlink', 0)->innertext;
            $itemLink = $file->find('.question-hyperlink', 0)->href;
            echo "{$itemTitle},{$itemLink}<br>";
        }
    }
for($i = 1; $i<10; $i++){
        get_content($link.$i);
    }
?>