Php 错误的正则表达式可以工作

Php 错误的正则表达式可以工作,php,regex,Php,Regex,为什么会这样?正则表达式忽略标记(.{0,5})%imm'; $s=文件\u获取\u内容(“http://boringmachines.blogspot.com/2006/12/bitbin-herb-recordings.html"); $out=preg\u match\u all($p,$s,$matches,preg\u SET\u顺序); 打印(匹配项); 我得到一个数组: Array ( [0] => Array ( [0] =

为什么会这样?正则表达式忽略标记
(.{0,5})%imm';
$s=文件\u获取\u内容(“http://boringmachines.blogspot.com/2006/12/bitbin-herb-recordings.html");
$out=preg\u match\u all($p,$s,$matches,preg\u SET\u顺序);
打印(匹配项);
我得到一个数组:

Array
(
    [0] => Array
        (
            [0] => /div><a href="http://photos1.blogger.com/x/blogger/1112/3281/1600/484028/aliasEPlined.jpg"><img style="FLOAT: left; MARGIN: 0px 10px 10px 0px; WIDTH: 162px; CURSOR: hand; HEIGHT: 149px" height="124" alt="" src="http://photos1.blogger.com/x/blogger/1112/3281/320/925013/aliasEPlined.jpg" width="199" border="0" /></a><span style="font-size:85%;">Due to last weeks bad weather here in Glasgow, I was unable to connect to the web and keep up those regular <a href="http://profile.myspace.com/index.cfm?fuseaction=user.viewprofile&friendid=57230462">Herb Recordings </a>mp3's. Instead, I posted a <a href="http://boringmachines.blogspot.com/2006/11/bitbin-herb-recordings.html#links">video</a> of one of their earlier releases, BitBin. Thankfully, some good has came from thsoe storms, as Herb have kindly donated another mp3, in the form of "<em>May</em>" by BitBin.</span><br /><span style="font-size:85%;"></span><br /><span style="font-size:85%;"><a href="http://profile.myspace.com/index.cfm?fuseaction=user.viewprofile&amp;friendID=26396670">BitBin</a> is a London based artist and had his "Alias" ep released by Herb earlier this year. He influences are both broad, and for and electronic producer, quite unusual. The likes of Brian Eno, Bola and Warp Records, sit side by side with Brian Wilson, Captain Beefheart and dEUS. His bio may explain a few things, as BitBin claims he is all about "<em>glitching his way through any field of music and reality</em>"</span><br /><span style="font-size:85%;"></span><br /><span style="font-size:85%;">"<em>May</em>" itself is an expansive and dark slice of electronica reminiscent of Bola and Gescom. For me, however, this is akin to the music Thom Yorke has been pushing Radiohead towards over the last few years. The beats echo those of "<em>Idioteque</em>", and believe, me that is no bad thing.</span><br /><span style="font-size:85%;"></span><br /><span style="font-size:85%;">The "Alias" ep can be ordered<a href="http://profile.myspace.com/index.cfm?fuseaction=user.viewprofile&friendid=57230462"> here</a>, however, the cd release will feature 3 extra tracks, "<em>making it, one longer trip</em>". An <a href="http://www.urband.net/interview/bitbin/index.html">interview and podcast</a> with
            [1] => /div>
            [2] => interview and podcast
            [3] =>  with
        )

)
数组
(
[0]=>阵列
(
[0]=>/div>由于上周格拉斯哥的恶劣天气,我无法连接到网络并保持正常的mp3播放。相反,我发布了他们早期版本BitBin的一个版本。谢天谢地,风暴带来了一些好处,因为赫伯以“五月”的形式捐赠了另一个mp3作者:BitBin。

是一位伦敦艺术家,有他的“别名”Herb今年早些时候发布的ep。他影响广泛,对于电子制作人来说,非常不寻常。像Brian Eno、Bola和Warp Records,与Brian Wilson、Beefheart上尉和dEUS并排坐在一起。他的简历可能会解释一些事情,正如BitBin所说的那样“在音乐和现实的任何领域都能找到一条捷径”

“梅”本身就是一个广阔而黑暗的电子音乐片段,让人联想到博拉和格斯康。然而,对我来说,这类似于汤姆约克在过去几年里一直在推动Radiohead走向的音乐。节拍与“白痴”的节拍相呼应,相信我,这并不是坏事。

可以订购“Alias”ep,但是,cd发行版将增加3首曲目,“一次更长的旅程” [1] =>/div> [2] =>采访和播客 [3] =>与 ) )
尽管我们必须得到:

Array
(
    [0] => Array
        (
            [0] => . An <a href="http://www.urband.net/interview/bitbin/index.html">interview and podcast</a> with
            [1] => . An 
            [2] => interview and podcast
            [3] =>  with
        )

)
数组
(
[0]=>阵列
(
[0]=>。带有
[1] 安先生
[2] =>采访和播客
[3] =>与
)
)

欢迎来到在HTML上使用正则表达式的乐趣和奇迹。尝试改用查找HTML中要查找的内容


//a[contains(@href,'urband.net')]
这样的XPath查询将比正则表达式精确得多。

欢迎来到在HTML上使用正则表达式的乐趣和奇迹。尝试改用查找HTML中要查找的内容

//a[contains(@href,'urband.net')]
这样的XPath查询比正则表达式精确得多。

试试:

$url='urband\.net'
$p='%(.{0,5})(.{0,5})%imm'

编辑-使用Perl测试:

$/ = undef;

my $str = <DATA>;
my $count = 0;

while ($str =~ /(.{0,5})<a\s+href="[^"]*urband\.net[^"]*"\s*>(.*?)<\/a>(.{0,5})/sg)
{
   print "Array\n";
   print "(\n";
   print "    [$count] => Array\n";
   print "        (\n";
   print "            [0] => $&\n";
   print "            [1] => $1\n";
   print "            [2] => $2\n";
   print "            [3] => $3\n";
   print "        )\n";
   print "\n";
   print ")\n";
   ++$count;
}
$/=undf;
我的$str=;
我的$count=0;
而($str=~/(.0,5})与
[1] 安先生
[2] =>采访和播客
[3] =>与
)
)
试试:

$url='urband\.net'
$p='%(.{0,5})(.{0,5})%imm'

编辑-使用Perl测试:

$/ = undef;

my $str = <DATA>;
my $count = 0;

while ($str =~ /(.{0,5})<a\s+href="[^"]*urband\.net[^"]*"\s*>(.*?)<\/a>(.{0,5})/sg)
{
   print "Array\n";
   print "(\n";
   print "    [$count] => Array\n";
   print "        (\n";
   print "            [0] => $&\n";
   print "            [1] => $1\n";
   print "            [2] => $2\n";
   print "            [3] => $3\n";
   print "        )\n";
   print "\n";
   print ")\n";
   ++$count;
}
$/=undf;
我的$str=;
我的$count=0;
而($str=~/(.0,5})与
[1] 安先生
[2] =>采访和播客
[3] =>与
)
)

我不了解StackOverflow用户。如果有人试图用regexps解析HTML,那么问题总是得到-1。@Karolis,我认为HTML解析器不能做我想做的事情need@Karolis:我不是向下投票人,但尝试用正则表达式解析HTML是邪恶的,正则表达式不是为此而创建的,它会造成比解决问题更多的麻烦,等等(试试谷歌,你会一次又一次地发现相同的结论)。这不是关于用户,而是关于常识。使用DOM(或DOMXPath)取而代之的是。更简单、更稳定,而且适合于完成任务。也就是说:询问HTML中的正则表达式本身并不坏,但决议应该指向正确的方向。@Abel我知道。但是如果有人试图以错误的方式做某事,我们会帮助他以正确的方式做。如果这是一个好问题,为什么要否决投票?@Karolis:同意。这是正确的很遗憾,我们不应该这样做。当然,我们只能猜测选民落选的原因,也可能是他不认为这是一个措辞很好的问题。我不理解StAcExpBube用户。如果有人试图用ReGEXPS解析HTML,那么问题总是得到- 1。@ Karolis,我认为HTML解析器不能做我所做的事情。need@Karolis:我不是投票人,但试图用正则表达式解析HTML是邪恶的,正则表达式不是为此而创建的,而是会造成比解决问题更多的麻烦等等(试试谷歌,你会反复发现相同的结论)。这与用户无关,而是与常识有关。使用DOM(或DOMXPath)取而代之的是。更简单、更稳定,而且适合于完成任务。也就是说:询问HTML中的正则表达式本身并不坏,但决议应该指向正确的方向。@Abel我知道。但是如果有人试图以错误的方式做某事,我们会帮助他以正确的方式做。如果这是一个好问题,为什么要否决投票?@Karolis:同意。这是正确的很遗憾,我们不应该这样做。当然,我们只能猜测选民落选的原因,也可能是他不认为这是一个措辞很好的问题。在这种情况下,你可以使用5个字符,它会让你得到“<代码> <代码>标签”的节点,然后你可以使用以前的兄弟姐妹/下一个文本节点,提取相邻的文本节点并做一个5个字符子串。在这个例子中,取5个字符,它会给你一些“<代码> <代码>标签”的节点,然后你就可以找到我们。e previousSibling/nextSibling并提取相邻的textnodes,并执行5char子字符串。
Array
(
    [0] => Array
        (
            [0] => . An <a href="http://www.urband.net/interview/bitbin/index.html">interview and podcast</a> with
            [1] => . An
            [2] => interview and podcast
            [3] =>  with
        )

)