Php &引用;垂直的;ASCII中的正则表达式匹配;“图像”;
注:这是一个关于现代正则表达式口味可能性的问题。这不是用其他方法解决这个问题的最佳方法。它的灵感来自,但这并不局限于正则表达式 问题 在ASCII“image”/art/map/string格式中:Php &引用;垂直的;ASCII中的正则表达式匹配;“图像”;,php,.net,regex,perl,pcre,Php,.net,Regex,Perl,Pcre,注:这是一个关于现代正则表达式口味可能性的问题。这不是用其他方法解决这个问题的最佳方法。它的灵感来自,但这并不局限于正则表达式 问题 在ASCII“image”/art/map/string格式中: ....X....... ..X..X...X.... X.X...X..X..... X....XXXXXX..... X..XXX........... .....X.......... ..............X ..X...........X.... ..X...........X...
....X.......
..X..X...X....
X.X...X..X.....
X....XXXXXX.....
X..XXX...........
.....X..........
..............X
..X...........X....
..X...........X....X...
....X.....
我想找到三个X
s组成的简单垂直线:
X
X
X
图像中的行数是可变的,每行的宽度也是可变的
问题
使用regex(PCRE/PHP、Perl、.NET或类似工具)是否可以:
您可以旋转图像,然后搜索
XXX
编辑
以下解决方案有两个严重问题:
XXX
序列,因为pos
前进太多X
彼此上方的连续行。不一定要连续有三个原始答案 这是一个将Perl代码嵌入正则表达式的答案。由于Perl正则表达式可以使用代码在正则表达式内断言任意条件或发出部分正则表达式,因此它们不仅限于匹配常规语言或上下文无关语言,还可以匹配Chomsky层次结构中更高层语言的某些部分 要匹配的语言可以用正则表达式术语描述为
^ .{n} X .*\n
.{n} X .*\n
.{n} X
其中n
是一个数字。这与匹配anbncn语言一样复杂,后者是上下文敏感语言的典型示例
我们可以轻松地匹配第一行,并使用一些Perl代码为其他行发出正则表达式:
/^ (.*?) X
(?: .*\n (??{"." x length($1)}) X){2}
/mx
太短了!它有什么作用
固定在行首,匹配尽可能少的非换行字符,然后是^(.*)X
。我们记得排到X
的队伍是捕获组X
$1
- 我们将一个组重复两次,匹配行的其余部分,一个换行符,然后注入一个正则表达式,该正则表达式匹配与
长度相同的字符串。在此之后,必须有一个$1
X
X
相互重叠的每个字符串
如果我们想提取所有这样的序列,我们必须很漂亮。因为序列可能重叠,例如
.X
XX
XX
X.
下一个匹配开始的位置不得超过第一个X
。我们可以通过向后看和向前看来实现这一点。Perl只支持常量长度查找,但具有提供类似语义的\K
转义。因此
/^ (.*?) \K X
(?=( (?: .*\n (??{"."x length($1)}) X ){2} ))
/gmx
将匹配三个垂直X
es的每个序列。测试时间:
$ perl -E'my$_=join"",<>; say "===\n$1X$2" while /^(.*?)\KX(?=((?:.*\n(??{"."x length($1)})X){2}))/gmx' <<'END'
....X.......
..X..X...X....
X.X...X..X.....
X....XXXXXX.....
X..XXX...........
.....X..........
..............X
..X...........X....
..X...........X....X...
....X.....
END
===
..X..X...X....
X.X...X..X.....
X....XXXXX
===
X.X...X..X.....
X....XXXXXX.....
X
===
X....XXXXXX.....
X..XXX...........
.....X
===
..............X
..X...........X....
..X...........X
$ perl -E'my$_=join"",<>; say "===\n$1" while /^(?=((?:(?=(X.*\n|.(?-1).)X).*\n){2}.*))/gmx' <<'END'
....X.......
..X..X...X....
X.X...X..X.....
X....XXXXXX.....
X..XXX...........
.....X..........
..............X
..X...........X....
..X...........X....X...
....X.....
END
===
..X..X...X....
X.X...X..X.....
X....XXXXXX.....
===
X.X...X..X.....
X....XXXXXX.....
X..XXX...........
===
X....XXXXXX.....
X..XXX...........
.....X..........
===
..............X
..X...........X....
..X...........X....X...
我们要声明,每行的前导…
部分长度相同。我们可以通过使用基格X.*\n
递归来实现:
(X.*\n|.(?-1).)X
如果我们将其锚定在一行的开头,我们可以匹配两个垂直的X
es。要匹配两行以上的代码,我们必须先进行一次前瞻性递归,然后将匹配位置提前到下一行,然后重复。为此,我们只需匹配*\n
这将产生以下正则表达式,它可以匹配具有三个垂直X
e的字符串:
/ ^
(?:
(?=( X.*\n | .(?-1). ) X)
.*\n # go to next line
){2}
/mx
但这还不够好,因为我们想要匹配所有这样的序列。为了做到这一点,我们基本上把整个正则表达式放在一个前瞻中。regex引擎确保每次提升位置以产生新的匹配
/ ^
(?=
(
(?:
(?= (X.*\n | .(?-1). ) X)
.*\n # go to next line
){2}
.* # include next line in $1
)
)
/mx
测试时间:
$ perl -E'my$_=join"",<>; say "===\n$1X$2" while /^(.*?)\KX(?=((?:.*\n(??{"."x length($1)})X){2}))/gmx' <<'END'
....X.......
..X..X...X....
X.X...X..X.....
X....XXXXXX.....
X..XXX...........
.....X..........
..............X
..X...........X....
..X...........X....X...
....X.....
END
===
..X..X...X....
X.X...X..X.....
X....XXXXX
===
X.X...X..X.....
X....XXXXXX.....
X
===
X....XXXXXX.....
X..XXX...........
.....X
===
..............X
..X...........X....
..X...........X
$ perl -E'my$_=join"",<>; say "===\n$1" while /^(?=((?:(?=(X.*\n|.(?-1).)X).*\n){2}.*))/gmx' <<'END'
....X.......
..X..X...X....
X.X...X..X.....
X....XXXXXX.....
X..XXX...........
.....X..........
..............X
..X...........X....
..X...........X....X...
....X.....
END
===
..X..X...X....
X.X...X..X.....
X....XXXXXX.....
===
X.X...X..X.....
X....XXXXXX.....
X..XXX...........
===
X....XXXXXX.....
X..XXX...........
.....X..........
===
..............X
..X...........X....
..X...........X....X...
$perl-E'my$\=join”“,;在/^(?(?:(?=(X.*\n |.(?-1)。)X.*\n{2}.*)/gmx'时说“==\n$1”)/gmx'如果你想找到一个“垂直”模式,这里有一个解决方案。如果您还想匹配“水平”模式,请尝试使用单独的匹配,可能检查重叠的匹配位置。记住,计算机不知道线是什么。这是人类编造出来的武断的东西。字符串只是一个一维序列,其中我们将某些字符表示为行尾
#!/usr/local/perls/perl-5.18.0/bin/perl
use v5.10;
my $pattern = qr/XXX/p;
my $string =<<'HERE';
....X.......
..X..X...X....
X.X...X..X.....
X....XXXXXX.....
X..XXX...........
.....X..........
..............X
..X...........X....
..X...........X....X...
....X.....
HERE
$transposed = transpose_string( $string );
open my $tfh, '<', \ $transposed;
while( <$tfh> ) {
while( /$pattern/g ) {
my $pos = pos() - length( ${^MATCH} );
push @found, { row => $pos, col => $. - 1 };
pos = $pos + 1; # for overlapping matches
}
}
# row and col are 0 based
print Dumper( \@found ); use Data::Dumper;
sub transpose_string {
my( $string ) = @_;
open my $sfh, '<', \ $string;
my @transposed;
while( <$sfh> ) {
state $row = 0;
chomp;
my @chars = split //;
while( my( $col, $char ) = each @chars ) {
$transposed[$col][$row] = $char;
}
$row++;
}
my @line_end_positions = ( 0 );
foreach my $col ( 0 .. $#transposed ) {
$transposed .= join '', @{ $transposed[$col] };
$transposed .= "\n";
}
close $sfh;
return $transposed;
}
#/usr/local/perls/perl-5.18.0/bin/perl
使用v5.10;
我的$pattern=qr/XXX/p;
我的$string=$。-1 };
pos=$pos+1;#用于重叠匹配
}
}
#行和列是基于0的
打印转储程序(\@找到);使用数据::转储程序;
子转置字符串{
我的($string)=@;
打开我的$sfh,“首先:好问题。我认为尝试将regex引擎发挥到极限是非常有益的
基本的.NET解决方案
你们在评论中说使用.NET会很容易,但既然还没有答案,我想我会写一个
您可以使用.NET的可变长度查找和平衡组来解决问题1.和问题2.大多数工作都是由平衡组完成的,但可变长度查找对于能够检测从同一行开始的多个匹配至关重要
无论如何,以下是模式:
(?<= # lookbehind counts position of X into stack
^(?:(?<a>).)* # push an empty capture on the 'a' stack for each character
# in front of X
) # end of lookbehind
X # match X
(?=.*\n # lookahead checks that there are two more Xs right below
(?:(?<-a>)(?<b>).)* # while we can pop an element from stack 'a', push an
# element onto 'b' and consume a character
(?(a)(?!)) # make sure that stack 'a' is empty
X.*\n # match X and the rest of the line
(?:(?<-b>).)* # while we can pop an element from stack 'b', and consume
# a character
(?(b)(?!)) # make sure that stack 'b' is empty
X # match a final X
) # end of lookahead
给出两个匹配项,一个在第一行,一个在第二行。我们希望避免这种情况,只报告一个匹配项(如果有6到8个X
s,则报告两个,如果有9到11个X
s等等)。此外,我们希望在第1、4、7、X
处报告匹配项
我们可以通过要求第一个X
前面加上其他3个X
s的整数倍来调整上述模式,以实现此解决方案。检查此模式的基本思想与之前使用的堆栈操作相同(除了我们在3个堆栈之间来回移动,这样在找到3个X
s之后,我们就到了开始的地方)。要做到这一点,我们必须稍微调整一下lookback
(?<=
# note that the lookbehind below does NOT affect the state of stack 'a'!
# in fact, negative lookarounds can never change any capturing state.
# this is because they have to fail for the engine to continue matching.
# and if they fail, the engine needs to backtrack out of them, in which
# case the previous capturing state will be restored.
(?<! # if we get here, there is another X on top of the last
# one in the loop, and the pattern fails
^ # make sure we reached the beginning of the line
(?(a)(?!)) # make sure that stack 'a' is empty
(?:(?<-a>).)* # while we can pop an element from stack 'a', and consume
# a character
X.*\n # consume the next line and a potential X
)
# at this point we know that there are less than 3 Xs in the same column
# above this position. but there might still be one or two more. these
# are the cases we now have to eliminate, and we use a nested negative
# lookbehind for this. the lookbehind simply checks the next row and
# asserts that there is no further X in the same column.
# this, together with the loop, below means that the X we are going to match
# is either the topmost in its column or preceded by an integer multiple of 3
# Xs - exactly what we are looking for.
(?:
# at this point we've advanced the lookbehind's "cursor" by exactly 3 Xs
# in the same column, AND we've restored the same amount of captures on
# stack 'a', so we're left in exactly the same state as before and can
# potentially match another 3 Xs upwards this way.
# the fact that stack 'a' is unaffected by a full iteration of this loop is
# also crucial for the later (lookahead) part to work regardless of the
# amount of Xs we've looked at here.
^ # make sure we reached the beginning of the line
(?(c)(?!)) # make sure that stack 'a' is empty
(?:(?<-c>)(?<a>).)* # while we can pop an element from stack 'c', push an
# element onto 'a' and consume a character
X.*\n # consume the next line and a potential X
(?(b)(?!)) # make sure that stack 'b' is empty
(?:(?<-b>)(?<c>).)* # while we can pop an element from stack 'b', push an
# element onto 'c' and consume a character
X.*\n # consume the next line and a potential X
(?(a)(?!)) # make sure that stack 'a' is empty
(?:(?<-a>)(?<b>).)* # while we can pop an element from stack 'a', push an
# element onto 'b' and consume a character
X.*\n # consume the next line and a potential X
)* # this non-capturing group will match exactly 3 leading
# Xs in the same column. we repeat this group 0 or more
# times to match an integer-multiple of 3 occurrences.
^ # make sure we reached the beginning of the line
(?:(?<a>).)* # push an empty capture on the 'a' stack for each
# character in front of X
) # end of lookbehind (or rather beginning)
# the rest is the same as before
X # match X
(?=.*\n # lookahead checks that there are two more Xs right below
(?:(?<-a>)(?<b>).)* # while we can pop an element from stack 'a', push an
# element onto 'b' and consume a character
(?(a)(?!)) # make sure that stack 'a' is empty
X.*\n # match X and the rest of the line
(?:(?<-b>).)* # while we can pop an element from stack 'b', and consume
# a character
(?(b)(?!)) # make sure that stack 'b' is empty
X # match a final X
) # end of lookahead
有一只猫
^
(?:(?|
(?(5)(?![\s\S]*+\5))
(?!(?!)()())
(?=
(?:
.
(?=
.*+\n
( \3? . )
.*+\n
( \4? . )
)
)*?
X .*+\n
\3
X .*+\n
\4
)
()
|
(?(5)(?=[\s\S]*+\5)|(?!))
(?:
.
(?=
.*+\n
( \1? .)
.*+\n
( \2? .)
)
)+?
(?=
(?<=X).*+\n
(\1)
(?<=X).*+\n
(\2)
(?<=X)
)
(?=
([\s\S])
[\s\S]*
([\s\S] (?(6)\6))
)
){2})+
^
(?:(?|
checkForNextColumn
|
countAndAdvance
){2})+
^(?:(?|
(?(5)(?![\s\S]*+\5)) # if group 5 has matched before make sure that
# it didn't match empty
checkForNextColumn # contains 4 capturing groups
() # this is group 5, match empty
|
(?(5)(?=[\s\S]*+\5)|(?!)) # make sure that group 5 is defined and that it
# matched empty
advanceEngineState # contains 4 capturing groups
(?=
([\s\S]) # this is group 5, match non-empty
[\s\S]* # advance to the end very end of the string
([\s\S] (?(6)\6)) # add a character from the end of the string to
# group 6
)
){2})+
..X..X..
..X..X..
..X..X..
(?:
.
(?=
.*+\n
( \1? .)
.*+\n
( \2? .)
)
)+?
(?=
(?<=X) .*+\n
(\1)
(?<=X) .*+\n
(\2)
(?<=X)
)
(?!(?!)()())
(?=
(?:
.
(?=
.*+\n
( \3? . )
.*+\n
( \4? . )
)
)*?
X .*+\n
\3
X .*+\n
\4
)
(?xm) # ignore comments and whitespace, ^ matches beginning of line
^ # beginning of line
(?:
. # any character except \n
(?= # lookahead
.*+\n # go to next line
( \1?+ . ) # add a character to the 1st capturing group
.*+\n # next line
( \2?+ . ) # add a character to the 2nd capturing group
)
)*? # repeat as few times as needed
X .*+\n # X on the first line and advance to next line
\1?+ # if 1st capturing group is defined, use it, consuming exactly the same number of characters as on the first line
X .*+\n # X on the 2nd line and advance to next line
\2?+ # if 2st capturing group is defined, use it, consuming exactly the same number of characters as on the first line
X # X on the 3rd line
^
(?:
(?: # match .+? characters
.
(?= # counting the same number on the following two lines
.*+\n
( \1?+ . )
.*+\n
( \2?+ . )
)
)+?
(?<= X ) # till the above consumes an X
(?= # that matches the following conditions
.*+\n
\1?+
(?<= X )
.*+\n
\2?+
(?<= X )
)
(?= # count the number of matches
.*+\n
( \3?+ . ) # the number of matches = length of $3
)
)* # repeat as long as there are matches on this line
.*\n? # remove the rest of the line
$in =~ s/regex/$3/gmx;
$count = length $in;
Test #0:
--------------------
X
X
X
result: 1 (X)
Test #1:
--------------------
..X....
..X....
..X....
result: 1 (.)
Test #2:
--------------------
..X.X..
..X.X..
....X..
result: 1 (.)
Test #3:
--------------------
..X....
..X....
...X...
result: 0 ()
Test #4:
--------------------
..X....
...X...
..X....
result: 0 ()
Test #5:
--------------------
....X..
.X..X..
.X.....
result: 0 ()
Test #6:
--------------------
.X..X..
.X.X...
.X.X...
result: 1 (.)
Test #7:
--------------------
.X..X..
.X..X..
.X..X..
result: 2 (.X)
Test #8:
--------------------
XXX
XXX
XXX
result: 3 (XXX)
Test #9:
--------------------
X.X.X
XXXXX
XXXXX
.X.X.
result: 5 (XXXXX)
Test #10:
--------------------
1....X.......
2..X..X...X....
3X.X...X..X.....
4X....XXXXXX.....
5X..XXX...........
6.....X..........
7.........X....X
8..X......X....X....
9..X......X....X....X...
A....X.....
B.X..X..
C.....
XXX
XXX
XXX
.
result: 8 (3458.XXX)
// assuming $input contains your string
$input = explode("\n", $input);
$rotated = array();
foreach ($input as $line)
{
$l = strlen($line);
for ($i = 0; $i < $l; $i++)
{
if (isset($rotated[$i]))
$rotated[$i] .= $line[$i];
else
$rotated[$i] = $line[$i];
}
}
$rotated = implode("\n", $rotated);
..XXX.....
..........
.XX....XX.
....X.....
X...X....X
.X.XXX....
..XX......
...X......
...X......
.XXX......
...X.....
.........
........
........
....XXX
.....
...
..
..
X
.
.
.
if (preg_match_all('/\bXXX\b/', $rotated, $m))
var_dump($m[0]);
array(4) {
[0] =>
string(3) "XXX"
[1] =>
string(3) "XXX"
[2] =>
string(3) "XXX"
[3] =>
string(3) "XXX"
}
\A(?:(?=(?(3)[\s\S]*(?=\3\z))(?|.(?=.*\n(\1?+.).*\n(\2?+.))|.*\n()())+?(?<=X)(?=.*\n\1(?<=X).*\n\2(?<=X))(?=([\s\S]*\z)))(?=[\s\S]*([\s\S](?(4)\4)))[\s\S])+[\s\S]*(?=\4\z)|\G(?!\A|[\s\S]?\z)
\A(?:
(?=
(?(3)[\s\S]*(?=\3\z)) # Resume from where we ended last iteration
(?| # Branch-reset group used to reset \1
.(?=.*\n(\1?+.).*\n(\2?+.)) # and \2 to empty values when a new line
| # is reached. ".*\n" is used to skip the
.*\n()() # rest of a line that is longer than the
)+? # ones below it.
(?<=X)(?=.*\n\1(?<=X).*\n\2(?<=X)) # Find a XXX formation
(?=([\s\S]*\z)) # Save the rest of the line in \3 for
) # when we resume looking next iteration
(?=[\s\S]*([\s\S](?(4)\4))) # For every formation found, consume another
# character at the end of the subject
[\s\S] # Consume a character so we can move on
)+
[\s\S]*(?=\4\z) # When all formations around found, consume
# up to just before \4 at the subject end.
|
\G(?!\A|[\s\S]?\z) # Now we just need to force the rest of the
# matches. The length of \4 is equal to the
# number of formations. So to avoid overmatching,
# we need to exclude a couple of cases.