Php 从同一网站的多个页面提取内容_Php_Dom

Php 从同一网站的多个页面提取内容

php dom

Php 从同一网站的多个页面提取内容,php,dom,Php,Dom,我有这个脚本从同一个网站的多个页面提取数据。大约有120页下面是我用来获取单个页面的代码 $html = file_get_contents('https://www.example.com/product?page=1'); $dom = new DOMDocument; @$dom->loadHTML($html); $links = $dom->getElementsByTagName('div'); foreach ($links as $link){ fi

我有这个脚本从同一个网站的多个页面提取数据。大约有120页

下面是我用来获取单个页面的代码

$html = file_get_contents('https://www.example.com/product?page=1');

$dom = new DOMDocument;

@$dom->loadHTML($html);

$links = $dom->getElementsByTagName('div');

foreach ($links as $link){
    file_put_contents('products.txt', $link->getAttribute('data-product-name') .PHP_EOL, FILE_APPEND);
}

如何为多个页面执行此操作？特定页面的链接是增量的，就像下一页是

https://www.example.com/product?page=2

等等。如何在不为每个链接创建不同文件的情况下执行此操作？

如何：

function extractContent($page)
{
    $html = file_get_contents('https://www.example.com/product?page='.$page);
    $dom = new DOMDocument;
    @$dom->loadHTML($html);
    $links = $dom->getElementsByTagName('div');

    foreach ($links as $link) {
        // skip empty attributes
        if (empty($link->getAttribute('data-product-name'))) {
            continue;
        }
        file_put_contents('products.txt', $link->getAttribute('data-product-name') .PHP_EOL, FILE_APPEND);
    }
}

for ($i=1; $i<=120; $i++) {
    extractContent($i);
}

函数提取内容（$page）
{
$html=文件\u获取\u内容（'https://www.example.com/product?page=":$页),；
$dom=新的DOMDocument；
@$dom->loadHTML（$html）；
$links=$dom->getElementsByTagName（'div'）；
foreach（$links作为$link）{
//跳过空属性
if（空（$link->getAttribute（'data-product-name'））{
继续；
}
文件\u put\u contents（'products.txt'，$link->getAttribute（'data-product-name'）.PHP\u EOL，文件\u APPEND）；
}
}
对于（$i=1；$iThanks，它起作用了。但是我遇到了另一个问题，在txt文件中获取空行。如何在保存它们之前删除它们？除非我停止php进程，否则这不会停止。函数被调用120次。根据需要更改此值。要删除空行，可以在写入之前测试属性的内容看我的编辑