Php 从外部源获取多个div_Php_Html_Curl

Php 从外部源获取多个div

php html curl

Php 从外部源获取多个div,php,html,curl,Php,Html,Curl,我试图在自己的页面上显示外部页面中的多个div。我有以下代码来提取div。这是一种通用的方式，但我希望它更具动态性这段代码从给定的div ID中提取内容，并将其显示在我自己的页面上 <?php header("Content-Type: text/html; charset=utf-8"); function file_get_contents_curl($url) { $ch = curl_init(); curl_setopt($ch, C

我试图在自己的页面上显示外部页面中的多个div。我有以下代码来提取div。这是一种通用的方式，但我希望它更具动态性

这段代码从给定的div ID中提取内容，并将其显示在我自己的页面上

        <?php   
header("Content-Type: text/html; charset=utf-8");

function file_get_contents_curl($url) {
    $ch = curl_init();

    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);

    $data = curl_exec($ch);
    curl_close($ch);

    return $data;
}

//The URL for the external content we want to pull
$html = file_get_contents_curl("https://www.page.com/subdir/");

//parsing all content:
$doc = new DOMDocument();
@$doc->loadHTML($html);

$content = $html;

//The div that includes the content '<div id="divid">'
$first_step = explode( '<div id="ide">' , $content );
$second_step = explode("</div>" , $first_step[1] );

//Do some magic with the URL
$url2 = $second_step[0];
$url3 = $second_step[8];
$url4 = $second_step[16];

$patterns = array(
    '#\./opening;jsessionid=.*\?#',
    '#<a href=#',
    '#span(.*?)>#'
);

$replaces = array(
    'https://www.page.com/subdir/opening?',
    '<a target="_blank" href=',
    'h1>'
);

//Print the final output
///Merge the result into one variable
$final_output = 
        preg_replace($patterns, $replaces, $url2) . 
        $second_step[1] . /* Description -- NOTE: By commenting out this you need to change the H1 margin in the style declaration */
        $second_step[2] . /* From date */
        $second_step[3] . /* To date */
        $second_step[4] . /* Company */
        $second_step[5] . /* Employment condition (full-time/part-time) */
        $second_step[6] . /* Department */
        //$second_step[7] . 
        '<hr>' . /* Horizontal rule */
        preg_replace($patterns, $replaces, $url3) . 
        $second_step[9] . /* Description -- NOTE: By commenting out this you need to change the H1 margin in the style declaration */
        $second_step[10] . /* From date */
        $second_step[11] . /* To date */
        $second_step[12] . /* Company */
        $second_step[13] . /* Employment condition (full-time/part-time) */
        $second_step[14] . /* Department */
        //$second_step[15] . 
        '<hr>' . /* Horizontal rule */
        preg_replace($patterns, $replaces, $url4) . 
        $second_step[17] . /* Description -- NOTE: By commenting out this you need to change the H1 margin in the style declaration */
        $second_step[18] . /* From date */
        $second_step[19] . /* To date */
        $second_step[20] . /* Company */
        $second_step[21] . /* Employment condition (full-time/part-time) */
        $second_step[22] . /* Department */
        $second_step[22] . 
        '<hr>'; /* Horizontal rule */

///Convert special chars
$converted = iconv("UTF-8", "UTF-8//TRANSLIT", $final_output);

///Display the final result
echo $converted;
?>

谢谢

您可以尝试使用正则表达式提取它：

preg_match_all('{<div[^<>]*[^<>]*>(?<content>.*?)</div>}', $content, $matches);
$array_of_contents = $matches[0];

preg_match_all（{（？.*？}'，$content，$matches）；
$array_of_contents=$matches[0]；

现在，$array\u of_contents是一个数组，包含那些div中的所有内容。当然，它只涉及那些内部div，它们必须在一个级别上。

id就是id，一个页面上不应该有更多具有相同id的元素。感谢@n-dru，我更仔细地查看了我提取的原始数据，并编辑了我的问题。只有一个具有指定ID的div，但此div有许多没有任何ID或类的子div。但是这些子div的子div有类…您应该使用DOMDocument或类似工具，而不是尝试执行大量字符串操作和正则表达式。感谢您的回答@MikeBrant，我不太熟悉

DOMDocument

。在使用Phlanger作为ASP的PHP编译器时，我在获取https URL时遇到问题。因此，我不得不添加一个curl函数和一些DOM命令来管理数据的提取。但到目前为止，我对多姆所做的就是这些。在最后一篇文章中，我将此添加到代码中。如何使用DOM在这里获得所需的结果？因此，如果子div中有div，那么它们就不会包含在此数组中？我在我的问题中添加了一些原始数据，以说明我从外部页面中提取的内容。它将显示成对内容中的所有内容-我不知道您的结构是什么，请尝试一下。如果这是一次性的事情，你不需要创建通用的解决方案，那就一步一步地做，直到你提取出你所需要的一切。

preg_match_all('{<div[^<>]*[^<>]*>(?<content>.*?)</div>}', $content, $matches);
$array_of_contents = $matches[0];