Regex 使用正则表达式删除重复项

Regex 使用正则表达式删除重复项,regex,perl,Regex,Perl,输入: OUT :abc123: : Warning: /var/tmp/prodperim/installer/abc123.fw is older than it should be (not updated for 36 hours) OUT :abc123 : : Warning: /var/tmp/prodperim/installer/abc123.fw.schedule is older than it should be (not updated for 36 hours) OU

输入:

OUT :abc123: : Warning: /var/tmp/prodperim/installer/abc123.fw is older than it should be (not updated for 36 hours)
OUT :abc123 : : Warning: /var/tmp/prodperim/installer/abc123.fw.schedule is older than it should be (not updated for 36 hours)
OUT abc1234: : Warning: / filesystem 100% full
OUT abc1234: : Warning: / filesystem 100% full
OUT abc1234: : Warning: /var/tmp/prodperim/installer/abc123.fw is older than it should be (not updated for 36 hours)
OUT bcd111: : Warning: /var/tmp/prodperim/installer/abc123.fw.schedule is older than it should be (not updated for 36 hours)
OUT bcd111: : Succeeded.
abc123 
abc1234
bcd111
我只想筛选已匹配“警告”的主机

输出:

OUT :abc123: : Warning: /var/tmp/prodperim/installer/abc123.fw is older than it should be (not updated for 36 hours)
OUT :abc123 : : Warning: /var/tmp/prodperim/installer/abc123.fw.schedule is older than it should be (not updated for 36 hours)
OUT abc1234: : Warning: / filesystem 100% full
OUT abc1234: : Warning: / filesystem 100% full
OUT abc1234: : Warning: /var/tmp/prodperim/installer/abc123.fw is older than it should be (not updated for 36 hours)
OUT bcd111: : Warning: /var/tmp/prodperim/installer/abc123.fw.schedule is older than it should be (not updated for 36 hours)
OUT bcd111: : Succeeded.
abc123 
abc1234
bcd111
我试过下面的正则表达式,它匹配了所有的正则表达式

([\w]+)\s+:\s+:\s+Warning
使用正则表达式可以避免重复吗?

当您听到Perl中的“unique”时,请考虑“hash”:

#/usr/bin/perl
使用警告;
严格使用;
我的%uniq;
而(){
/:?(\S+?)[:\S]+警告/和$uniq{$1}=1;
}
为键%uniq;
顺便说一句,您的输入和正则表达式不会导致您指定的输出。我更改了正则表达式,但我不确定您的输入示例是否正确。冒号的位置真的很疯狂吗?

当你听到Perl中的“unique”时,想想“hash”:

#/usr/bin/perl
使用警告;
严格使用;
我的%uniq;
而(){
/:?(\S+?)[:\S]+警告/和$uniq{$1}=1;
}
为键%uniq;

顺便说一句,您的输入和正则表达式不会导致您指定的输出。我更改了正则表达式,但我不确定您的输入示例是否正确。冒号的位置真的很疯狂吗?

您可以使用这个
perl
一行代码:

perl -lane 'if (/\bWarning\b/) { @F[1] =~ s/(\W+)//g; print "@F[1]" }' file
abc123
abc123
abc1234
abc1234
abc1234
bcd111

您可以使用这个
perl
一行程序:

perl -lane 'if (/\bWarning\b/) { @F[1] =~ s/(\W+)//g; print "@F[1]" }' file
abc123
abc123
abc1234
abc1234
abc1234
bcd111
你可以试试这个。看演示。抓拍

你可以试试这个。看演示。抓拍


使用此模式w/
gs
选项

OUT\s*:?([^:]+):\s*:\s*Warning(?!.*?\1\s*:\s*:\s*Warning)  

使用此模式w/
gs
选项

OUT\s*:?([^:]+):\s*:\s*Warning(?!.*?\1\s*:\s*:\s*Warning)  

这更像是对@choroba上述回答的补充/补充,因为他用“当你听到‘独特’时,想想‘散列’”。你应该接受@choroba的答案:-)

在这里,我将您问题的正则表达式部分简化为调用
grep
,以关注唯一性,稍微更改了文件中的数据(以便它适合此处),并将其保存为
dups.log

# dups.log 
OUT :abc123: : Warning: /var/tmp/abc123.fw old (not updated for 36 hours)
OUT :abc123: : Warning: /var/tmp/abc123.fw.sched old (not updated for 36 hours)
OUT abc1234: : Warning: / filesystem 100% full
OUT abc1234: : Warning: / filesystem 100% full
OUT abc1234: : Warning: /var/tmp/abc123.fw old (not updated for 36 hours)
OUT bcd111: : Warning: /var/tmp/abc123.fw.sched old (not updated for 36 hours)
OUT bcd111: : Warning: /var/tmp/abc123.fw.sched old (not updated for 36 hours)
OUT bcd111: : Warning: /var/tmp/abc123.fw.sched old (not updated for 36 hours)
OUT bcd111: : Succeeded.
这一行程序的输出如下:

perl -E '++$seen{$_} for grep{/Warning/} <>; print %seen' dups.log

OUT :abc123: : Warning: /var/tmp/abc123.fw old (not updated for 36 hours)
OUT abc1234: : Warning: / filesystem 100% full
OUT :abc123: : Warning: /var/tmp/abc123.fw.sched old (not updated for 36 hours)
OUT bcd111: : Warning: /var/tmp/abc123.fw.sched old (not updated for 36 hours)
OUT abc1234: : Warning: /var/tmp/abc123.fw old (not updated for 36 hours)

但是,与上面的整行方法一样,正则表达式将只创建一个密钥副本(来自匹配和捕获),然后使用
++
将其递增,因此在中,您将在
%散列中获得“唯一”密钥a la
uniq


这是一个你永远不会忘记的简洁的perl技巧:-)

参考文献:

OUT :abc123: : Warning: /var/tmp/prodperim/installer/abc123.fw is older than it should be (not updated for 36 hours)
OUT :abc123 : : Warning: /var/tmp/prodperim/installer/abc123.fw.schedule is older than it should be (not updated for 36 hours)
OUT abc1234: : Warning: / filesystem 100% full
OUT abc1234: : Warning: / filesystem 100% full
OUT abc1234: : Warning: /var/tmp/prodperim/installer/abc123.fw is older than it should be (not updated for 36 hours)
OUT bcd111: : Warning: /var/tmp/prodperim/installer/abc123.fw.schedule is older than it should be (not updated for 36 hours)
OUT bcd111: : Succeeded.
abc123 
abc1234
bcd111
  • 对于使用@choroba中的散列的
    uniq
    ,有一些很好的解释
  • 这一点在描述
    %seen{}
    散列技巧的文章中有所涉及
  • Perlmaven展示了如何使用这种方法制作自己的产品

这更像是对@choroba上述回答的补充/补充,因为他用“当你听到‘独特’时,想想‘散列’”。你应该接受@choroba的答案:-)

在这里,我将您问题的正则表达式部分简化为调用
grep
,以关注唯一性,稍微更改了文件中的数据(以便它适合此处),并将其保存为
dups.log

# dups.log 
OUT :abc123: : Warning: /var/tmp/abc123.fw old (not updated for 36 hours)
OUT :abc123: : Warning: /var/tmp/abc123.fw.sched old (not updated for 36 hours)
OUT abc1234: : Warning: / filesystem 100% full
OUT abc1234: : Warning: / filesystem 100% full
OUT abc1234: : Warning: /var/tmp/abc123.fw old (not updated for 36 hours)
OUT bcd111: : Warning: /var/tmp/abc123.fw.sched old (not updated for 36 hours)
OUT bcd111: : Warning: /var/tmp/abc123.fw.sched old (not updated for 36 hours)
OUT bcd111: : Warning: /var/tmp/abc123.fw.sched old (not updated for 36 hours)
OUT bcd111: : Succeeded.
这一行程序的输出如下:

perl -E '++$seen{$_} for grep{/Warning/} <>; print %seen' dups.log

OUT :abc123: : Warning: /var/tmp/abc123.fw old (not updated for 36 hours)
OUT abc1234: : Warning: / filesystem 100% full
OUT :abc123: : Warning: /var/tmp/abc123.fw.sched old (not updated for 36 hours)
OUT bcd111: : Warning: /var/tmp/abc123.fw.sched old (not updated for 36 hours)
OUT abc1234: : Warning: /var/tmp/abc123.fw old (not updated for 36 hours)

但是,与上面的整行方法一样,正则表达式将只创建一个密钥副本(来自匹配和捕获),然后使用
++
将其递增,因此在中,您将在
%散列中获得“唯一”密钥a la
uniq


这是一个你永远不会忘记的简洁的perl技巧:-)

参考文献:

OUT :abc123: : Warning: /var/tmp/prodperim/installer/abc123.fw is older than it should be (not updated for 36 hours)
OUT :abc123 : : Warning: /var/tmp/prodperim/installer/abc123.fw.schedule is older than it should be (not updated for 36 hours)
OUT abc1234: : Warning: / filesystem 100% full
OUT abc1234: : Warning: / filesystem 100% full
OUT abc1234: : Warning: /var/tmp/prodperim/installer/abc123.fw is older than it should be (not updated for 36 hours)
OUT bcd111: : Warning: /var/tmp/prodperim/installer/abc123.fw.schedule is older than it should be (not updated for 36 hours)
OUT bcd111: : Succeeded.
abc123 
abc1234
bcd111
  • 对于使用@choroba中的散列的
    uniq
    ,有一些很好的解释
  • 这一点在描述
    %seen{}
    散列技巧的文章中有所涉及
  • Perlmaven展示了如何使用这种方法制作自己的产品

可能最好遍历行并填充哈希。可能最好遍历行并填充哈希。