Regex 使用正则表达式删除重复项
输入:Regex 使用正则表达式删除重复项,regex,perl,Regex,Perl,输入: OUT :abc123: : Warning: /var/tmp/prodperim/installer/abc123.fw is older than it should be (not updated for 36 hours) OUT :abc123 : : Warning: /var/tmp/prodperim/installer/abc123.fw.schedule is older than it should be (not updated for 36 hours) OU
OUT :abc123: : Warning: /var/tmp/prodperim/installer/abc123.fw is older than it should be (not updated for 36 hours)
OUT :abc123 : : Warning: /var/tmp/prodperim/installer/abc123.fw.schedule is older than it should be (not updated for 36 hours)
OUT abc1234: : Warning: / filesystem 100% full
OUT abc1234: : Warning: / filesystem 100% full
OUT abc1234: : Warning: /var/tmp/prodperim/installer/abc123.fw is older than it should be (not updated for 36 hours)
OUT bcd111: : Warning: /var/tmp/prodperim/installer/abc123.fw.schedule is older than it should be (not updated for 36 hours)
OUT bcd111: : Succeeded.
abc123
abc1234
bcd111
我只想筛选已匹配“警告”的主机
输出:
OUT :abc123: : Warning: /var/tmp/prodperim/installer/abc123.fw is older than it should be (not updated for 36 hours)
OUT :abc123 : : Warning: /var/tmp/prodperim/installer/abc123.fw.schedule is older than it should be (not updated for 36 hours)
OUT abc1234: : Warning: / filesystem 100% full
OUT abc1234: : Warning: / filesystem 100% full
OUT abc1234: : Warning: /var/tmp/prodperim/installer/abc123.fw is older than it should be (not updated for 36 hours)
OUT bcd111: : Warning: /var/tmp/prodperim/installer/abc123.fw.schedule is older than it should be (not updated for 36 hours)
OUT bcd111: : Succeeded.
abc123
abc1234
bcd111
我试过下面的正则表达式,它匹配了所有的正则表达式
([\w]+)\s+:\s+:\s+Warning
使用正则表达式可以避免重复吗?当您听到Perl中的“unique”时,请考虑“hash”:
#/usr/bin/perl
使用警告;
严格使用;
我的%uniq;
而(){
/:?(\S+?)[:\S]+警告/和$uniq{$1}=1;
}
为键%uniq;
顺便说一句,您的输入和正则表达式不会导致您指定的输出。我更改了正则表达式,但我不确定您的输入示例是否正确。冒号的位置真的很疯狂吗?当你听到Perl中的“unique”时,想想“hash”:
#/usr/bin/perl
使用警告;
严格使用;
我的%uniq;
而(){
/:?(\S+?)[:\S]+警告/和$uniq{$1}=1;
}
为键%uniq;
顺便说一句,您的输入和正则表达式不会导致您指定的输出。我更改了正则表达式,但我不确定您的输入示例是否正确。冒号的位置真的很疯狂吗?您可以使用这个
perl
一行代码:
perl -lane 'if (/\bWarning\b/) { @F[1] =~ s/(\W+)//g; print "@F[1]" }' file
abc123
abc123
abc1234
abc1234
abc1234
bcd111
您可以使用这个
perl
一行程序:
perl -lane 'if (/\bWarning\b/) { @F[1] =~ s/(\W+)//g; print "@F[1]" }' file
abc123
abc123
abc1234
abc1234
abc1234
bcd111
你可以试试这个。看演示。抓拍
你可以试试这个。看演示。抓拍
使用此模式w/
gs
选项
OUT\s*:?([^:]+):\s*:\s*Warning(?!.*?\1\s*:\s*:\s*Warning)
使用此模式w/
gs
选项
OUT\s*:?([^:]+):\s*:\s*Warning(?!.*?\1\s*:\s*:\s*Warning)
这更像是对@choroba上述回答的补充/补充,因为他用“当你听到‘独特’时,想想‘散列’”。你应该接受@choroba的答案:-) 在这里,我将您问题的正则表达式部分简化为调用
grep
,以关注唯一性,稍微更改了文件中的数据(以便它适合此处),并将其保存为dups.log
:
# dups.log
OUT :abc123: : Warning: /var/tmp/abc123.fw old (not updated for 36 hours)
OUT :abc123: : Warning: /var/tmp/abc123.fw.sched old (not updated for 36 hours)
OUT abc1234: : Warning: / filesystem 100% full
OUT abc1234: : Warning: / filesystem 100% full
OUT abc1234: : Warning: /var/tmp/abc123.fw old (not updated for 36 hours)
OUT bcd111: : Warning: /var/tmp/abc123.fw.sched old (not updated for 36 hours)
OUT bcd111: : Warning: /var/tmp/abc123.fw.sched old (not updated for 36 hours)
OUT bcd111: : Warning: /var/tmp/abc123.fw.sched old (not updated for 36 hours)
OUT bcd111: : Succeeded.
这一行程序的输出如下:
perl -E '++$seen{$_} for grep{/Warning/} <>; print %seen' dups.log
OUT :abc123: : Warning: /var/tmp/abc123.fw old (not updated for 36 hours)
OUT abc1234: : Warning: / filesystem 100% full
OUT :abc123: : Warning: /var/tmp/abc123.fw.sched old (not updated for 36 hours)
OUT bcd111: : Warning: /var/tmp/abc123.fw.sched old (not updated for 36 hours)
OUT abc1234: : Warning: /var/tmp/abc123.fw old (not updated for 36 hours)
但是,与上面的整行方法一样,正则表达式将只创建一个密钥副本(来自匹配和捕获),然后使用
++
将其递增,因此在中,您将在%散列中获得“唯一”密钥a launiq
这是一个你永远不会忘记的简洁的perl技巧:-)
参考文献:
OUT :abc123: : Warning: /var/tmp/prodperim/installer/abc123.fw is older than it should be (not updated for 36 hours)
OUT :abc123 : : Warning: /var/tmp/prodperim/installer/abc123.fw.schedule is older than it should be (not updated for 36 hours)
OUT abc1234: : Warning: / filesystem 100% full
OUT abc1234: : Warning: / filesystem 100% full
OUT abc1234: : Warning: /var/tmp/prodperim/installer/abc123.fw is older than it should be (not updated for 36 hours)
OUT bcd111: : Warning: /var/tmp/prodperim/installer/abc123.fw.schedule is older than it should be (not updated for 36 hours)
OUT bcd111: : Succeeded.
abc123
abc1234
bcd111
- 对于使用@choroba中的散列的
uniq
,有一些很好的解释
- 这一点在描述
%seen{}
散列技巧的文章中有所涉及
- Perlmaven展示了如何使用这种方法制作自己的产品
这更像是对@choroba上述回答的补充/补充,因为他用“当你听到‘独特’时,想想‘散列’”。你应该接受@choroba的答案:-)
在这里,我将您问题的正则表达式部分简化为调用grep
,以关注唯一性,稍微更改了文件中的数据(以便它适合此处),并将其保存为dups.log
:
# dups.log
OUT :abc123: : Warning: /var/tmp/abc123.fw old (not updated for 36 hours)
OUT :abc123: : Warning: /var/tmp/abc123.fw.sched old (not updated for 36 hours)
OUT abc1234: : Warning: / filesystem 100% full
OUT abc1234: : Warning: / filesystem 100% full
OUT abc1234: : Warning: /var/tmp/abc123.fw old (not updated for 36 hours)
OUT bcd111: : Warning: /var/tmp/abc123.fw.sched old (not updated for 36 hours)
OUT bcd111: : Warning: /var/tmp/abc123.fw.sched old (not updated for 36 hours)
OUT bcd111: : Warning: /var/tmp/abc123.fw.sched old (not updated for 36 hours)
OUT bcd111: : Succeeded.
这一行程序的输出如下:
perl -E '++$seen{$_} for grep{/Warning/} <>; print %seen' dups.log
OUT :abc123: : Warning: /var/tmp/abc123.fw old (not updated for 36 hours)
OUT abc1234: : Warning: / filesystem 100% full
OUT :abc123: : Warning: /var/tmp/abc123.fw.sched old (not updated for 36 hours)
OUT bcd111: : Warning: /var/tmp/abc123.fw.sched old (not updated for 36 hours)
OUT abc1234: : Warning: /var/tmp/abc123.fw old (not updated for 36 hours)
但是,与上面的整行方法一样,正则表达式将只创建一个密钥副本(来自匹配和捕获),然后使用++
将其递增,因此在中,您将在%散列中获得“唯一”密钥a launiq
这是一个你永远不会忘记的简洁的perl技巧:-)
参考文献:
OUT :abc123: : Warning: /var/tmp/prodperim/installer/abc123.fw is older than it should be (not updated for 36 hours)
OUT :abc123 : : Warning: /var/tmp/prodperim/installer/abc123.fw.schedule is older than it should be (not updated for 36 hours)
OUT abc1234: : Warning: / filesystem 100% full
OUT abc1234: : Warning: / filesystem 100% full
OUT abc1234: : Warning: /var/tmp/prodperim/installer/abc123.fw is older than it should be (not updated for 36 hours)
OUT bcd111: : Warning: /var/tmp/prodperim/installer/abc123.fw.schedule is older than it should be (not updated for 36 hours)
OUT bcd111: : Succeeded.
abc123
abc1234
bcd111
- 对于使用@choroba中的散列的
uniq
,有一些很好的解释
- 这一点在描述
%seen{}
散列技巧的文章中有所涉及
- Perlmaven展示了如何使用这种方法制作自己的产品
可能最好遍历行并填充哈希。可能最好遍历行并填充哈希。