Php 获取链接组的源并将每个链接保存到特定文件_Php

Php 获取链接组的源并将每个链接保存到特定文件

php

Php 获取链接组的源并将每个链接保存到特定文件,php,Php,我正在尝试获取一组链接的源，并将每个链接的源保存到单独的文件中 $urls = array( url1 => 'http://site1.com' url2 => 'http://site2.com' url3 => 'http://site3.com' ); } $files = array( file1 => 'file1.ht

我正在尝试获取一组链接的源，并将每个链接的源保存到单独的文件中

$urls = array(
              url1 => 'http://site1.com'
              url2 => 'http://site2.com'
              url3 => 'http://site3.com'
              );
}
$files = array(
               file1 => 'file1.html'
               file2 => 'file2.html'
               file3 => 'file3.html'
               );
foreach ($urls as $url) {
    ob_start();
    $html = file_get_contents($url);
    $doc = new DOMDocument(); // create DOMDocument
    libxml_use_internal_errors(true);
    $doc->loadHTML($html); // load HTML you can add $html

    echo $doc->saveHTML();

    $page = ob_get_contents();
    ob_end_flush();
}
foreach ($files as $file) {
    $fp = fopen("$file","w");
    fwrite($fp,$page);
    fclose($fp);
}

此时我被卡住了，它不工作了

您需要在同一个循环中读取URL并写入文件

foreach ($urls as $i => $url) {
    file_put_contents($files[$i], file_get_contents($url));
}

没有必要使用

DOMDocument

，除非您确实需要解析HTML而不仅仅是保存源代码。而且绝对没有理由使用

ob_XXX

函数，只需将结果直接分配给变量或传递给函数即可

作为设计建议，当您拥有URL和文件名等相关数据时，不要将它们放在单独的数组中。将它们放在单个二维阵列中：

$data = array(array('url' => 'http://site1.com',
                    'file' => 'file1.html'),
              array('url' => 'http://site2.com',
                    'file' => 'file2.html'),
              array('url' => 'http://site3.com',
                    'file' => 'file3.html'));

这将使所有相关的项目紧密地联系在一起，并且在更新内容时，您不必担心两个阵列不同步

然后您可以在这个阵列上循环：

foreach ($data as $datum) {
    $html = file_get_contents($datum['url']);
    $doc = new DOMDocument();
    libxml_use_internal_errors(true);
    $doc->loadHTML($html);
    // Do stuff with $doc
    $page = $doc->saveHTML($doc);
    file_put_contents($datum['file'], $page);
}

在第一个循环中，每次都覆盖变量

$page

。因此，第二个循环只将最后一页写入每个文件。@Barmar我不能得到这一点。你能纠正代码吗？你怎么能得不到这一点？只要一步一步地手工完成代码，它应该是显而易见的。每次通过循环执行

$page=ob_get_contents（），它将丢弃上一个URL的内容。因此，当循环完成时，变量只包含上一个URL的内容。1非常感谢您的详细响应，然后我必须使用DomDocument，因为我需要解析html并在保存到文件之前编辑其中的内容，所以如何使用当前代码使用DomDocument我已更新了答案，以说明如何使用DomDocument进行操作。