Perl：如何提取括号之间的字符串_Perl_Matching

Perl：如何提取括号之间的字符串

perl

Perl：如何提取括号之间的字符串,perl,matching,Perl,Matching,我有一个文本格式的文件： * [[ Virtualbox Guest Additions]] (2011/10/17 15:19) * [[ Abiword Wordprocessor]] (2010/10/27 20:17) * [[ Sylpheed E-Mail]] (2010/03/30 21:49) * [[ Kupfer]] (2010/05/16 20:18) “[]”和“]]”之间的所有单词都是对条目的简短描述。我需要提取整个条目，但不是每个单词我在这里找到了一个类

我有一个文本格式的文件：

* [[  Virtualbox Guest Additions]] (2011/10/17 15:19)
* [[  Abiword Wordprocessor]] (2010/10/27 20:17)
* [[  Sylpheed E-Mail]] (2010/03/30 21:49)
* [[   Kupfer]] (2010/05/16 20:18)

“[]”和“]]”之间的所有单词都是对条目的简短描述。我需要提取整个条目，但不是每个单词

我在这里找到了一个类似问题的答案：但是我无法理解答案：

“我的@array=$str=~/（\{（？：[^{}]*|（？0））*\}）/xg；”

任何有效的方法都会被接受，但解释会有很大帮助，例如：什么是

（？0）

或

/xg

。

代码可能如下所示：

use warnings; 
use strict;

my @subjects; # declaring a lexical variable to store all the subjects
my $pattern = qr/ 
  \[ \[    # matching two `[` signs
  \s*      # ... and, if any, whitespace after them
  ([^]]+) # starting from the first non-whitespace symbol, capture all the non-']' symbols
  ]]
/x;

# main processing loop:
while (<DATA>) { # reading the source file line by line
  if (/$pattern/) {      # if line is matched by our pattern
    push @subjects, $1;  # ... push the captured group of symbols into our array
  }
}
print $_, "\n" for @subjects; # print our array of subject line by line

__DATA__
* [[  Virtualbox Guest Additions]] (2011/10/17 15:19)
* [[  Abiword Wordprocessor]] (2010/10/27 20:17)
* [[  Sylpheed E-Mail]] (2010/03/30 21:49)
* [[   Kupfer]] (2010/05/16 20:18)

$text="* [[  Virtualbox Guest Additions]] (2011/10/17 15:19)
* [[  Abiword Wordprocessor]] (2010/10/27 20:17)
* [[  Sylpheed E-Mail]] (2010/03/30 21:49)
* [[   Kupfer]] (2010/05/16 20:18)
";

@array=($text=~/\[\[([^\]]*)\]\]/g);
print join(",",@array);

# this prints "  Virtualbox Guest Additions,  Abiword Wordprocessor,  Sylpheed E-Mail,   Kupfer"

正如您所看到的，这个描述很自然地转化为正则表达式。唯一可能不需要的是

/x

regex修饰符，它允许我对它进行大量注释。）

代码可能如下所示：

use warnings; 
use strict;

my @subjects; # declaring a lexical variable to store all the subjects
my $pattern = qr/ 
  \[ \[    # matching two `[` signs
  \s*      # ... and, if any, whitespace after them
  ([^]]+) # starting from the first non-whitespace symbol, capture all the non-']' symbols
  ]]
/x;

# main processing loop:
while (<DATA>) { # reading the source file line by line
  if (/$pattern/) {      # if line is matched by our pattern
    push @subjects, $1;  # ... push the captured group of symbols into our array
  }
}
print $_, "\n" for @subjects; # print our array of subject line by line

__DATA__
* [[  Virtualbox Guest Additions]] (2011/10/17 15:19)
* [[  Abiword Wordprocessor]] (2010/10/27 20:17)
* [[  Sylpheed E-Mail]] (2010/03/30 21:49)
* [[   Kupfer]] (2010/05/16 20:18)

$text="* [[  Virtualbox Guest Additions]] (2011/10/17 15:19)
* [[  Abiword Wordprocessor]] (2010/10/27 20:17)
* [[  Sylpheed E-Mail]] (2010/03/30 21:49)
* [[   Kupfer]] (2010/05/16 20:18)
";

@array=($text=~/\[\[([^\]]*)\]\]/g);
print join(",",@array);

# this prints "  Virtualbox Guest Additions,  Abiword Wordprocessor,  Sylpheed E-Mail,   Kupfer"

正如您所看到的，这个描述很自然地转化为正则表达式。唯一可能不需要的是

/x

regex修饰符，它允许我对它进行大量注释。）

\[

是一个文本[,，

是一个文本]，

表示0个或更多字符的每个序列，括号中的内容是一个捕获组，因此您可以稍后在脚本中使用$1（或$2..$9，具体取决于您有多少个组）访问它

将所有内容放在一起，您将匹配两个

然后匹配所有内容，直到最后一次出现两个连续的

更新在再次阅读你的问题时，我突然感到困惑，你是需要[[和]]之间的内容，还是整行内容？在这种情况下，完全不需要括号，只需测试模式是否匹配，无需捕获

my @array = $str =~ /( \{ (?: [^{}]* | (?0) )* \} )/xg;

\[

是一个文本[,，

是一个文本]，

表示0个或更多字符的每个序列，括号中的内容是一个捕获组，因此您可以稍后在脚本中使用$1（或$2..$9，具体取决于您有多少个组）访问它

将所有内容放在一起，您将匹配两个

然后匹配所有内容，直到最后一次出现两个连续的

my @array = $str =~ /( \{ (?: [^{}]* | (?0) )* \} )/xg;

“x”标志意味着在正则表达式中忽略空白，以允许更可读的表达式。“g”标志意味着结果将是从左到右的所有匹配的列表（match*g*lobally）

（？0）

表示第一组括号内的正则表达式。这是一个递归正则表达式，相当于一组规则，例如：

E := '{' ( NoBrace | E) '}'
NoBrace := [^{}]*

“x”标志意味着在正则表达式中忽略空白，以允许更可读的表达式。“g”标志意味着结果将是从左到右的所有匹配的列表（match*g*lobally）

（？0）

表示第一组括号内的正则表达式。这是一个递归正则表达式，相当于一组规则，例如：

E := '{' ( NoBrace | E) '}'
NoBrace := [^{}]*

您找到的答案是递归模式匹配，我认为您不需要

/x允许在regexp中使用无意义的空格和注释
/g在所有字符串中运行regexp。如果没有它，它只运行到第一场比赛
/xg是/x和/g的组合
（？0）再次运行regexp本身（递归）

如果我理解，你需要这样的东西：

use warnings; 
use strict;

my @subjects; # declaring a lexical variable to store all the subjects
my $pattern = qr/ 
  \[ \[    # matching two `[` signs
  \s*      # ... and, if any, whitespace after them
  ([^]]+) # starting from the first non-whitespace symbol, capture all the non-']' symbols
  ]]
/x;

# main processing loop:
while (<DATA>) { # reading the source file line by line
  if (/$pattern/) {      # if line is matched by our pattern
    push @subjects, $1;  # ... push the captured group of symbols into our array
  }
}
print $_, "\n" for @subjects; # print our array of subject line by line

__DATA__
* [[  Virtualbox Guest Additions]] (2011/10/17 15:19)
* [[  Abiword Wordprocessor]] (2010/10/27 20:17)
* [[  Sylpheed E-Mail]] (2010/03/30 21:49)
* [[   Kupfer]] (2010/05/16 20:18)

$text="* [[  Virtualbox Guest Additions]] (2011/10/17 15:19)
* [[  Abiword Wordprocessor]] (2010/10/27 20:17)
* [[  Sylpheed E-Mail]] (2010/03/30 21:49)
* [[   Kupfer]] (2010/05/16 20:18)
";

@array=($text=~/\[\[([^\]]*)\]\]/g);
print join(",",@array);

# this prints "  Virtualbox Guest Additions,  Abiword Wordprocessor,  Sylpheed E-Mail,   Kupfer"

您找到的答案是递归模式匹配，我认为您不需要

/x允许在regexp中使用无意义的空格和注释
/g在所有字符串中运行regexp。如果没有它，它只运行到第一场比赛
/xg是/x和/g的组合
（？0）再次运行regexp本身（递归）

如果我理解，你需要这样的东西：

use warnings; 
use strict;

my @subjects; # declaring a lexical variable to store all the subjects
my $pattern = qr/ 
  \[ \[    # matching two `[` signs
  \s*      # ... and, if any, whitespace after them
  ([^]]+) # starting from the first non-whitespace symbol, capture all the non-']' symbols
  ]]
/x;

# main processing loop:
while (<DATA>) { # reading the source file line by line
  if (/$pattern/) {      # if line is matched by our pattern
    push @subjects, $1;  # ... push the captured group of symbols into our array
  }
}
print $_, "\n" for @subjects; # print our array of subject line by line

__DATA__
* [[  Virtualbox Guest Additions]] (2011/10/17 15:19)
* [[  Abiword Wordprocessor]] (2010/10/27 20:17)
* [[  Sylpheed E-Mail]] (2010/03/30 21:49)
* [[   Kupfer]] (2010/05/16 20:18)

$text="* [[  Virtualbox Guest Additions]] (2011/10/17 15:19)
* [[  Abiword Wordprocessor]] (2010/10/27 20:17)
* [[  Sylpheed E-Mail]] (2010/03/30 21:49)
* [[   Kupfer]] (2010/05/16 20:18)
";

@array=($text=~/\[\[([^\]]*)\]\]/g);
print join(",",@array);

# this prints "  Virtualbox Guest Additions,  Abiword Wordprocessor,  Sylpheed E-Mail,   Kupfer"

如果文本永远不包含

，您只需按照之前的建议使用以下内容即可：

/\[\[ ( [^\]]* ) \]\]/x

下面允许在包含的文本中使用

，但我建议不要将其合并到更大的模式中：

/\[\[ ( .*? ) \]\]/x

以下内容允许在包含的文本中使用

，是最可靠的解决方案：

/\[\[ ( (?:(?!\]\]).)* ) \]\]/x

比如说,

if (my ($match) = $line =~ /\[\[ ( (?:(?!\]\]).)* ) \]\]/x) {
   print "$match\n";
}

或

```
/x
```
：忽略模式中的空白。允许添加空格以使模式可读，而不更改模式的含义。记录在
```
/g
```
：查找所有匹配项。记录在
```
（？0）
```
用于使模式递归，因为链接节点必须处理卷曲的任意嵌套。*<代码>/g：查找所有匹配项。记录在

如果文本永远不会包含