Php 用“查找元素”;简单的“HTML”和“DOM”;并将它们合并

Php 用“查找元素”;简单的“HTML”和“DOM”;并将它们合并,php,simple-html-dom,Php,Simple Html Dom,我想通过simple_html_dom提取html字符串的所有p元素。应获得p元素的顺序 <section class="box_1"> <header class="trigger"><h2>Title</h2></header> <div class="content"> <div class="box_2"> <div class="class"

我想通过simple_html_dom提取html字符串的所有p元素。应获得p元素的顺序

<section class="box_1">
    <header class="trigger"><h2>Title</h2></header>
    <div class="content">
        <div class="box_2">
            <div class="class"></div>
            <div class="content">
                <p>Text Level 2</p>
                <p>More Text Level 2</p>
            </div>
        </div>
        <div class="box_2">
            <div class="class"></div>
            <div class="content">
                <p>Text Level 2</p>
                <div class="box_3">
                    <div class="content">
                        <p>Text Level 3</p>
                    </div>
                </div>
            </div>
        </div>
    </div>
</section>
但有了这一点,即“文本级别2”和“更多文本级别2”将作为两个元素处理。但是它们应该合并到“文本级别2\n或更多文本级别2”,并且这应该作为一个元素处理

因此,在本例中,结果应该是一个包含三个元素(而不是四个)的数组

更新:我忘了什么。剖面元素之外可以有p元素。请看下面的“Lorem ipsum”

Lorem ipsum

同侧眼睑

标题 文本级别2

更多文本级别2

文本级别2

文本级别3

同侧眼睑

同侧眼睑

标题 文本级别1

同侧眼睑

同侧眼睑


这些p元素应该像其他元素一样对待(总结一个块的p元素)。在这种情况下,级别=0。

您必须首先确定哪个是哪个。是不是孤儿。然后,如果到达批次末尾,只需切换到下一个键/批次(不再剩下
p
标记)。考虑这个例子:

include 'simple_html_dom.php';
$html_string = '<p>Lorem ipsum</p><p>Lorem ipsum</p><section class="box_1"> <header class="trigger"><h2>Title</h2></header> <div class="content"> <div class="box_2"> <div class="class"></div> <div class="content"> <p>Text Level 2</p> <p>More Text Level 2</p> </div> </div> <div class="box_2"> <div class="class"></div> <div class="content"> <p>Text Level 2</p> <div class="box_3"> <div class="content"> <p>Text Level 3</p> </div> </div> </div> </div> </div></section><p>Lorem ipsum</p><p>Lorem ipsum</p><section class="box_1"> <header class="trigger"><h2>Title</h2></header> <div class="content"> <p>Text Level 1</p> </div></section><p>Lorem ipsum</p><p>Lorem ipsum</p>';
$html = str_get_html($html_string);
$array_content = array();
$index = 0;
foreach($html->find('p') as $key => $tag) {
    if($tag->parent()->tag == 'root') {
        // if alone p tag
        if(!isset($array_content[$index])) {
            $array_content[$index] = array('level' => 0, 'inhalt' => $tag->innertext);
        } else {
            $array_content[$index]['inhalt'] .= "\n" . $tag->innertext;
        }

    } elseif($tag->parent->class == 'content') {
        // handle tags with proper parents
        $type = $tag->parent->parent->class;
         switch($type) {
            case 'box_1': $level = 1; break;
            case 'box_2': $level = 2; break;
            case 'box_3': $level = 3; break;
        }

        if(!isset($array_content[$index])) {
            $array_content[$index] = array('level' => $level, 'inhalt' => $tag->innertext);
        } else {
            $array_content[$index]['inhalt'] .= "\n" . $tag->innertext;
        }

    }

    // change index if set to next batch
    if(!isset($tag->next_sibling()->tag) || $tag->next_sibling()->tag != 'p') {
        $index++;   
    }

}

echo '<pre>';
print_r($array_content);

我使用find('p')是因为我需要获取元素的顺序。@user3142695请查看我的修订版,我希望这一个适合这很好!谢谢你。我在帖子里忘了什么。所以我更新了它。请看一下。希望你能解决这个问题。@user3142695哇!你的结构变得更复杂了,检查我的revisions@user3142695我搞错逻辑了,看看修订版
<p>Lorem ipsum</p>
<p>Lorem ipsum</p>
<section class="box_1">
    <header class="trigger"><h2>Title</h2></header>
    <div class="content">
        <div class="box_2">
            <div class="class"></div>
            <div class="content">
                <p>Text Level 2</p>
                <p>More Text Level 2</p>
            </div>
        </div>
        <div class="box_2">
            <div class="class"></div>
            <div class="content">
                <p>Text Level 2</p>
                <div class="box_3">
                    <div class="content">
                        <p>Text Level 3</p>
                    </div>
                </div>
            </div>
        </div>
    </div>
</section>
<p>Lorem ipsum</p>
<p>Lorem ipsum</p>
<section class="box_1">
    <header class="trigger"><h2>Title</h2></header>
    <div class="content">
       <p>Text Level 1</p>
    </div>
</section>
<p>Lorem ipsum</p>
<p>Lorem ipsum</p>
include 'simple_html_dom.php';
$html_string = '<p>Lorem ipsum</p><p>Lorem ipsum</p><section class="box_1"> <header class="trigger"><h2>Title</h2></header> <div class="content"> <div class="box_2"> <div class="class"></div> <div class="content"> <p>Text Level 2</p> <p>More Text Level 2</p> </div> </div> <div class="box_2"> <div class="class"></div> <div class="content"> <p>Text Level 2</p> <div class="box_3"> <div class="content"> <p>Text Level 3</p> </div> </div> </div> </div> </div></section><p>Lorem ipsum</p><p>Lorem ipsum</p><section class="box_1"> <header class="trigger"><h2>Title</h2></header> <div class="content"> <p>Text Level 1</p> </div></section><p>Lorem ipsum</p><p>Lorem ipsum</p>';
$html = str_get_html($html_string);
$array_content = array();
$index = 0;
foreach($html->find('p') as $key => $tag) {
    if($tag->parent()->tag == 'root') {
        // if alone p tag
        if(!isset($array_content[$index])) {
            $array_content[$index] = array('level' => 0, 'inhalt' => $tag->innertext);
        } else {
            $array_content[$index]['inhalt'] .= "\n" . $tag->innertext;
        }

    } elseif($tag->parent->class == 'content') {
        // handle tags with proper parents
        $type = $tag->parent->parent->class;
         switch($type) {
            case 'box_1': $level = 1; break;
            case 'box_2': $level = 2; break;
            case 'box_3': $level = 3; break;
        }

        if(!isset($array_content[$index])) {
            $array_content[$index] = array('level' => $level, 'inhalt' => $tag->innertext);
        } else {
            $array_content[$index]['inhalt'] .= "\n" . $tag->innertext;
        }

    }

    // change index if set to next batch
    if(!isset($tag->next_sibling()->tag) || $tag->next_sibling()->tag != 'p') {
        $index++;   
    }

}

echo '<pre>';
print_r($array_content);
Array
(
    [0] => Array
        (
            [level] => 0
            [inhalt] => Lorem ipsum
Lorem ipsum
        )

    [1] => Array
        (
            [level] => 2
            [inhalt] => Text Level 2
More Text Level 2
        )

    [2] => Array
        (
            [level] => 2
            [inhalt] => Text Level 2
        )

    [3] => Array
        (
            [level] => 3
            [inhalt] => Text Level 3
        )

    [4] => Array
        (
            [level] => 0
            [inhalt] => Lorem ipsum
Lorem ipsum
        )

    [5] => Array
        (
            [level] => 1
            [inhalt] => Text Level 1
        )

    [6] => Array
        (
            [level] => 0
            [inhalt] => Lorem ipsum
Lorem ipsum
        )

)