Php 正则表达式所有内容，包括空格_Php_Regex_Web Scraping_Xpath

Php 正则表达式所有内容，包括空格

php regex web-scraping xpath

Php 正则表达式所有内容，包括空格,php,regex,web-scraping,xpath,Php,Regex,Web Scraping,Xpath,我需要一个正则表达式模式所有字符，包括空格什么是不是在PHP变量 <li class="xyz" data-name="abc"> <span id="XXX">some words</span> <div data-attribute="values"> <a class="klm" href="http://example.com/blabla">somethings</a> &l

我需要一个正则表达式模式所有字符，包括空格什么是不是在PHP变量

<li class="xyz" data-name="abc">
    <span id="XXX">some words</span>
    <div data-attribute="values">
        <a class="klm" href="http://example.com/blabla">somethings</a>
    </div>
    <div class="xyz sub" data-name="abc-sub"><a href="http://www.example.com/blabla/images"><img src="/images/any_image.jpg" class="qqwwee"></a></div>
</li><!--repeating li tags-->


一些词

我写了一个模式

preg_match_all('#<li((?s).*?)<div((?s).*?)href="((?s).*?)"((?s).*?)</li>#', $subject, $matches);

preg_match_all（'#使用（？：）
将允许分组，但不会捕获这些组，例如，以下内容：
#<li(?:(?s).*?)<div(?:(?s).*?)href="((?s).*?)"(?:(?s).*?)</li>#

#'
一些词
'，
),
1 => 
排列(
0 => 'http://example.com/blabla',
),
)

您的所有匹配项都将包含在$matches[1]
中，因此请反复阅读
阅读这个关于StackOverflow的著名答案
HTML不是常规语言，因此无法使用RegExp可靠地处理它。取而代之的是，使用一个合适的（并且健壮的）HTML解析器
还请注意，数据挖掘（分析）！=（数据收集）
如果不希望regexp组存储“捕获的”数据，请使用非捕获标志
(?:some-complex-regexp-here)

在您的情况下，以下方法可能有效：
(?s)<li.*?<div.*?href="([^"]*?)".*?</li>

（？s）
array (
  0 => 
  array (
    0 => '<li class="xyz" data-name="abc">
    <span id="XXX">some words</span>
    <div data-attribute="values">
        <a class="klm" href="http://example.com/blabla">somethings</a>
    </div>
    <div class="xyz sub" data-name="abc-sub"><a href="http://www.example.com/blabla/images"><img src="/images/any_image.jpg" class="qqwwee"></a></div>
</li>',
  ),
  1 => 
  array (
    0 => 'http://example.com/blabla',
  ),
)

(?:some-complex-regexp-here)

(?s)<li.*?<div.*?href="([^"]*?)".*?</li>