Php RegExp。在特定标记之间查找标记_Php_Regex

Php RegExp。在特定标记之间查找标记

php regex

Php RegExp。在特定标记之间查找标记,php,regex,Php,Regex,有一个包含许多HREF的html代码。但我不需要所有的HREF。我只想获取div中包含的HREF： <div class="category-map second-links"> ***** </div> <p class="sec"> ***** 我希望看到的结果是： <a href='xxx'>yyy</a> <a href='zzz'>www</a> ... ... 我的版本（不工作）：（

有一个包含许多HREF的html代码。但我不需要所有的HREF。我只想获取div中包含的HREF：

<div class="category-map second-links"> 
*****
</div> <p class="sec">

*****

我希望看到的结果是：

<a href='xxx'>yyy</a>
<a href='zzz'>www</a>
...

...

我的版本（不工作）：

（？使用
//从URL创建DOM
$html=文件获取html（“”）；
//查找特定标记
foreach（$html->find（'div.category-map.second-links a'）作为$anchor）{
$anchors[]=$anchor；
}
印刷(锚);；
使用
//从URL创建DOM
$html=文件获取html（“”）；
//查找特定标记
foreach（$html->find（'div.category-map.second-links a'）作为$anchor）{
$anchors[]=$anchor；
}
印刷(锚);；
如果您想使用正则表达式，那么您可能会使用两个正则表达式查询
一个用于获取所有div，另一个用于在每个div中查找href
因为在这样的单个查询中
"<div.*?<a href='(?<data>.*?)'.*?</div>"

我不确定上面的dom是否能工作%100，但我给你这个提示，希望你能为你做一个正确的如果你想使用正则表达式，那么你可能会使用两个正则表达式查询
一个用于获取所有div，另一个用于在每个div中查找href
因为在这样的单个查询中
"<div.*?<a href='(?<data>.*?)'.*?</div>"

我不确定上面的dom是否能正常工作，但我给你这个提示，希望你能为你做一个合适的
免责声明：您最好使用合适的html解析器。这个答案是出于教育目的，尽管如果它是有效的html:p，它比您的普通正则表达式更可靠
Regex很棒
所以我决定分两部分来做：

匹配
中的所有内容，即使是嵌套的
循环这些匹配，并匹配
...
*****
...
；
$links=array（）；
预赛~
#将某个div与某个类匹配
（？：#非捕获组
（？：）？#匹配评论！
（？：（？！]*>）#检查是否没有开始/结束标签
|#或（这意味着有）
（？R）#递归模式，与（？0）相同
)*#重复0次或更多次
#匹配结束标记
（？=*？）#确保表达式前面有

~sxi'，$input，$matches）；
if（isset（$matches[0]））{
foreach（$matches[0]为$match）{
预赛~

工具书类





免责声明：您最好使用合适的html解析器。这个答案是出于教育目的，尽管如果它是有效的html:p，它比您的普通正则表达式更可靠
Regex很棒
所以我决定分两部分来做：

匹配
中的所有内容，即使是嵌套的
循环这些匹配，并匹配
...
*****
...
；
$links=array（）；
预赛~
#将某个div与某个类匹配
（？：#非捕获组
（？：）？#匹配评论！
（？：（？！]*>）#检查是否没有开始/结束标签
|#或（这意味着有）
（？R）#递归模式，与（？0）相同
)*#重复0次或更多次
#匹配结束标记
（？=*？）#确保表达式前面有

~sxi'，$input，$matches）；
if（isset（$matches[0]））{
foreach（$matches[0]为$match）{
预赛~

工具书类




如果将HTML加载到DOM文档中，则可以使用Xpath从中查询节点
文档中的所有a元素：

//a

具有祖先/父div元素的：

//a[祖先：div]

使用class属性类别映射第二个链接


//a[祖先：：div[@class=“类别映射第二链接”]

获取已筛选a元素的href属性（可选）

//a[祖先：：div[@class=“类别映射第二链接”]/@href

完整示例：
$html = <<<'HTML'
<div class="category-map second-links"> 
*****
    <!--<div class="category-map second-links"> Comment hacks --> 
    <div class="category-map second-links">
        <a href='xxx'>yyy</a>
        <a href='zzz'>www</a>
...
    </div>
<div class="category-map second-links"> 
*****
    <!--<div class="category-map second-links"> Comment hacks --> 
    <div class="category-map second-links">
        <a href='aaa'>bbb</a>
        <a href='ccc'>ddd</a>
...
    </div>
</div> <p class="sec">
HTML;

$dom = new DOMDocument();
$dom->loadHtml($html);
$xpath = new DOMXpath($dom);

// fetch the href attributes
$hrefs = array();
foreach ($xpath->evaluate('//a[ancestor::div[@class = "category-map second-links"]]/@href') as $node) {
  $hrefs[] = $node->value;
}
var_dump($hrefs);

// fetch the a elements an read some data from them
$linkData = array();
foreach ($xpath->evaluate('//a[ancestor::div[@class = "category-map second-links"]]') as $node) {
  $linkData[] = array(
    'href' => $node->getAttribute('@href'),
    'text' => $node->nodeValue,
  );
}
var_dump($linkData);

// fetch the a elements and store their html
$links = array();
foreach ($xpath->evaluate('//a[ancestor::div[@class = "category-map second-links"]]') as $node) {
  $links[] = $dom->saveHtml($node);
}
var_dump($links);

$html=
价值
}
var_dump（$hrefs）；
//获取a元素并从中读取一些数据
$linkData=array（）；
foreach（$xpath->evaluate（'//a[祖先：：div[@class=“category-map-second-links”]]）作为$node）{
$linkData[]=数组(
'href'=>$node->getAttribute（'@href'），
'text'=>$node->nodeValue，
);
}
var_dump（$linkData）；
//获取a元素并存储它们的html
$links=array（）；
foreach（$xpath->evaluate（'//a[祖先：：div[@class=“category-map-second-links”]]）作为$node）{
$links[]=$dom->saveHtml（$node）；
}
var_dump（$links）；
如果将HTML加载到DOM文档中，可以使用Xpath从中查询节点
文档中的所有a元素：

//a

具有祖先/父div元素的：

//a[祖先：div]

使用class属性类别映射第二个链接


//a[祖先：：div[@class=“类别映射第二链接”]

获取已筛选a元素的href属性（可选）

//a[祖先：：div[@class=”
$dom->find('div a')->attrib('href');

<div\s+class\s*=\s*"\s*category-map\s+second-links\s*"\s*>    # match a certain div with a certain classes
(?:                                                           # non-capturing group
   (?:<!--.*?-->)?                                            # Match the comments !
   (?:(?!</?div[^>]*>).)                                      # check if there is no start/closing tag
   |                                                          # or (which means there is)
   (?R)                                                       # Recurse the pattern, it's the same as (?0)
)*                                                            # repeat zero or more times
</div\s*>                                                     # match the closing tag
(?=.*?<p\s+class\s*=\s*"\s*sec\s*"\s*>)                       # make sure there is <p class="sec"> ahead of the expression

<a[^>]*>    # match the beginning a tag
.*?         # match everything ungreedy until ...
</a\s*>     # match </a       > or </a>
# Not forgetting the xsi modifiers

$input = '<div class="category-map second-links"> 
*****
    <!--<div class="category-map second-links"> Comment hacks --> 
    <div class="category-map second-links">
        <a href=\'xxx\'>yyy</a>
        <a href=\'zzz\'>www</a>
...
    </div>
<div class="category-map second-links"> 
*****
    <!--<div class="category-map second-links"> Comment hacks --> 
    <div class="category-map second-links">
        <a href=\'aaa\'>bbb</a>
        <a href=\'ccc\'>ddd</a>
...
    </div>
</div> <p class="sec">';

$links = array();

preg_match_all('~
<div\s+class\s*=\s*"\s*category-map\s+second-links\s*"\s*>    # match a certain div with a certain classes
(?:                                                           # non-capturing group
   (?:<!--.*?-->)?                                            # Match the comments !
   (?:(?!</?div[^>]*>).)                                      # check if there is no start/closing tag
   |                                                          # or (which means there is)
   (?R)                                                       # Recurse the pattern, it\'s the same as (?0)
)*                                                            # repeat zero or more times
</div\s*>                                                     # match the closing tag
(?=.*?<p\s+class\s*=\s*"\s*sec\s*"\s*>)                       # make sure there is <p class="sec"> ahead of the expression
~sxi', $input, $matches);

if(isset($matches[0])){
    foreach($matches[0] as $match){
        preg_match_all('~
                            <a[^>]*>    # match the beginning a tag
                            .*?         # match everything ungreedy until ...
                            </a\s*>     # match </a       > or </a>
                        ~isx', $match, $tempLinks);
        if(isset($tempLinks[0])){
            array_push($links, $tempLinks[0]);
        }
    }
}

if(isset($links[0])){
    print_r($links[0]);
}else{
    echo 'empty :(';
}

$html = <<<'HTML'
<div class="category-map second-links"> 
*****
    <!--<div class="category-map second-links"> Comment hacks --> 
    <div class="category-map second-links">
        <a href='xxx'>yyy</a>
        <a href='zzz'>www</a>
...
    </div>
<div class="category-map second-links"> 
*****
    <!--<div class="category-map second-links"> Comment hacks --> 
    <div class="category-map second-links">
        <a href='aaa'>bbb</a>
        <a href='ccc'>ddd</a>
...
    </div>
</div> <p class="sec">
HTML;

$dom = new DOMDocument();
$dom->loadHtml($html);
$xpath = new DOMXpath($dom);

// fetch the href attributes
$hrefs = array();
foreach ($xpath->evaluate('//a[ancestor::div[@class = "category-map second-links"]]/@href') as $node) {
  $hrefs[] = $node->value;
}
var_dump($hrefs);

// fetch the a elements an read some data from them
$linkData = array();
foreach ($xpath->evaluate('//a[ancestor::div[@class = "category-map second-links"]]') as $node) {
  $linkData[] = array(
    'href' => $node->getAttribute('@href'),
    'text' => $node->nodeValue,
  );
}
var_dump($linkData);

// fetch the a elements and store their html
$links = array();
foreach ($xpath->evaluate('//a[ancestor::div[@class = "category-map second-links"]]') as $node) {
  $links[] = $dom->saveHtml($node);
}
var_dump($links);