Regex 如何使用perl'；s正则表达式_Regex_Perl

Regex 如何使用perl'；s正则表达式

regex perl

Regex 如何使用perl'；s正则表达式,regex,perl,Regex,Perl,我需要用Perl从多行字符串中提取几个部分。我在while循环中应用相同的正则表达式。我的问题是获取以文件结尾的最后一节。我的解决方法是附加标记。这样，正则表达式将始终找到并结束。有更好的方法吗示例文件： Header ==== /home/src/file1.c#1 ==== content file1 line 1 of file1 line 2 of file1 line 3 of file1 another line of file1 ==== /home/src/file2

我需要用Perl从多行字符串中提取几个部分。我在while循环中应用相同的正则表达式。我的问题是获取以文件结尾的最后一节。我的解决方法是附加标记。这样，正则表达式将始终找到并结束。有更好的方法吗

示例文件：

Header

==== /home/src/file1.c#1 ====
content file1
line 1 of file1
line 2 of file1
line 3 of file1

another line of file1

==== /home/src/file2.c#1 ====
content file2
line 1 of file2
line 2 of file2
line 3 of file2

another line of file2

Perl脚本：

#!/usr/bin/env perl

my $desc = do { local $/ = undef; <> };

$desc .= "\n===="; # set the end marker

while($desc =~ /^==== (?<filename>.*?)#.*?====$(?<content>.*?)(?=^====)/mgsp) {
  print "filename=", $+{filename}, "\n";
  print "content=", $+{content}, "\n";
}

#/usr/bin/env perl
my$desc=do{local$/=unde；}；
$desc.=“\n==”；#设置结束标记
而（$desc=~/^===（？*？）#*？====$（？*？？（=^===）/mgsp）{
打印“filename=，$+{filename}，”\n；
打印“content=，$+{content}，”\n；
}

这样，脚本将同时查找这两个段。如何避免添加标记？

使用贪婪修饰符

？

是一个巨大的危险信号。您通常可以在一个模式中使用它一次，但更多的是一个bug。如果要匹配不包含字符串的文本，请使用以下选项：

(?:(?!STRING).)*

因此，您可以获得以下信息：

/
   ^==== [ ] (?<filename> [^\n]+ ) [ ] ====\n
   (?<content> (?:(?! ^==== ).)* )
/xsmg

/
^===[]（？[^\n]+）[]====\n
(? (?:(?! ^==== ).)* )
/xsmg

代码：

my$desc=do{local$/；}；
当(
$desc=~/
^===[]（？[^\n]+）[]====\n
(? (?:(?! ^==== ).)* )
/xsmg
) {
打印“文件名=\n”；
打印“内容=\n”；
}
__资料__
标题
===/home/src/file1.c#1====
内容文件1
文件1的第1行
文件1的第2行
文件1的第3行
另一行文件1
==/home/src/file2.c#1====
内容文件2
文件2的第1行
文件2的第2行
文件2的第3行
另一行文件2

输出：

filename=<</home/src/file1.c#1>>
content=<<content file1
line 1 of file1
line 2 of file1
line 3 of file1

another line of file1

>>
filename=<</home/src/file2.c#1>>
content=<<content file2
line 1 of file2
line 2 of file2
line 3 of file2

another line of file2
>>

文件名=
内容=
文件名=
内容=

首先，你把整个文件都弄脏了，这让事情变得更加棘手。如果逐行读取文件，这相对简单

use strict;
use warnings 'all';

my $file;

while ( <> ) {
    if ( /^====\s+(.*\S)#\S*\s+====/ ) {
        $file = $1;
        print "filename=$file\n";
        print 'content=';
    }
    elsif ( $file ) {
        print;
    }
}

或者，如果您需要存储每个文件的全部内容，可能是作为散列，它将如下所示

use strict;
use warnings 'all';

my $file;
my %data;

while ( <> ) {
    if ( /^====\s+(.*\S)#\S*\s+====/ ) {
        $file = $1;
    }
    elsif ( $file ) {
        $data{$file} .= $_;
    }
}

for my $file ( sort keys %data ) {
    print "filename=$file\n";
    print "content=$data{$file}";
}

使用严格；
使用“全部”警告；
我的$file；
我的%数据；
而（）{
如果（/^===\s+（.*\s）#\s*\s+===/）{
$file=$1；
}
elsif（$文件）{
$data{$file}.=$\；
}
}
对于my$文件（排序键%data）{
打印“文件名=$file\n”；
打印“内容=$data{$file}”；
}

输出与上面第一个版本的输出相同

您使用

[]

，因为这是将空格放入扩展正则表达式的方法？您可以使用

，但我发现

[]

更具可读性。另外两个选项：

\x20

，

\N{SPACE}

我明白了。我试图理解这个表达式

^====（？（？：（？！^===））*）

。我知道它捕获了

^====

后面不包含

^====

的所有内容。

？：

避免捕捉参数中的表达式。我没有得到（

））*

构造。

））*

它不是有效的正则表达式模式。我已经解释了

（？：（！PAT）。*）

的作用。它只是

（？！^=====）。

在

（？：…）*

的内部。你明白每一个吗？一堆（

（？：…）*

）字符，它们不是

^====

（

（？！^====）。

）的开头。是的，我那天晚上晚些时候收到了这个。谢谢你再解释一遍。我找到了一种解决问题的方法，我觉得很舒服。对我来说，这个解决方案看起来更复杂。我的例子只是我脚本的核心。我随后分析了每一个内容，并需要它在一个字符串变量中传递给sub。我还需要该行中的文件列表+附加信息，这在我的原始问题中被省略，以保持简单。记住perl的座右铭：TIMTOWTDI。@SteffenRoller:您最初的方法显然太复杂了，因为您需要帮助！让正则表达式做太多的事情是很常见的：语言中通常有更合适的工具。如果你有一个更好的解决方案，那么你应该在这里显示它作为另一个答案。请记住，您的问题的主要目的是帮助其他人解决类似问题。您还应该考虑打开一个标量变量进行输入，使用

openmy$fh，'您有权发表您的意见。我对这个问题的最后一点评论是：。它在互联网上，所以它一定是真的：-）。
filename=/home/src/file1.c
content=content file1
line 1 of file1
line 2 of file1
line 3 of file1

another line of file1

filename=/home/src/file2.c
content=content file2
line 1 of file2
line 2 of file2
line 3 of file2

another line of file2

use strict;
use warnings 'all';

my $file;
my %data;

while ( <> ) {
    if ( /^====\s+(.*\S)#\S*\s+====/ ) {
        $file = $1;
    }
    elsif ( $file ) {
        $data{$file} .= $_;
    }
}

for my $file ( sort keys %data ) {
    print "filename=$file\n";
    print "content=$data{$file}";
}