Bash 如何打印与awk多次匹配的列的第一次出现

Bash 如何打印与awk多次匹配的列的第一次出现,bash,awk,grep,Bash,Awk,Grep,我有一个包含所有备份的日志文件和一个值为“是”的列,这意味着它不会被保留策略删除(保留)。对于特定的vmname,可能有1行或多行具有保留的列=yes 我的意见是: = FULL == 20210105 == 2100 == ASR-FULL-20210105-2100 == YES = FULL == 20210202 == 2100 == ASR-FULL-20210202-2100 == YES = FU

我有一个包含所有备份的日志文件和一个值为“是”的列,这意味着它不会被保留策略删除(保留)。对于特定的vmname,可能有1行或多行具有保留的列=yes

我的意见是:

=    FULL     ==   20210105   ==     2100     == ASR-FULL-20210105-2100 ==  YES
=    FULL     ==   20210202   ==     2100     == ASR-FULL-20210202-2100 ==  YES
=    FULL     ==   20210302   ==     2100     == ASR-FULL-20210302-2100 ==  YES
=    FULL     ==   20210406   ==     2100     == ASR-FULL-20210406-2100 ==  YES
=    FULL     ==   20210105   ==     2146     == DNS10_7-FULL-20210105-2146 ==  YES
=    FULL     ==   20210202   ==     2153     == DNS10_7-FULL-20210202-2153 ==  YES
=    FULL     ==   20210302   ==     2148     == DNS10_7-FULL-20210302-2148 ==  YES
=    FULL     ==   20210406   ==     2122     == DNS10_7-FULL-20210406-2122 ==  YES
=    FULL     ==   20210105   ==     2105     == execnet.0-FULL-20210105-2105 ==  YES
=    FULL     ==   20210202   ==     2106     == execnet.0-FULL-20210202-2106 ==  YES
=    FULL     ==   20210302   ==     2106     == execnet.0-FULL-20210302-2106 ==  YES
=    FULL     ==   20210406   ==     2105     == execnet.0-FULL-20210406-2105 ==  YES
=    FULL     ==   20210106   ==     0200     == Prtgadmin.0-FULL-20210106-0200 ==  YES
=    FULL     ==   20210105   ==     2216     == sandbox.0-FULL-20210105-2216 ==  YES
=    FULL     ==   20210202   ==     2227     == sandbox.0-FULL-20210202-2227 ==  YES
=    FULL     ==   20210406   ==     2152     == sandbox.0-FULL-20210406-2152 ==  YES
=    FULL     ==   20210105   ==     2236     == wwwp.0-FULL-20210105-2236 ==  YES
=    FULL     ==   20210202   ==     2249     == wwwp.0-FULL-20210202-2249 ==  YES
=    FULL     ==   20210105   ==     2259     == wwws.0-FULL-20210105-2259 ==  YES
=    FULL     ==   20210202   ==     2314     == wwws.0-FULL-20210202-2314 ==  YES
=    FULL     ==   20210105   ==     2259     == webhost.0-FULL-20210105-2259 ==  YES
我想要的输出是打印n-1个最早的匹配项(前n-1个)

到目前为止,我可以通过运行下面的awk命令得到下面的结果,但它显示的是最近的匹配。我也希望有一个awk命令。 年份过滤器没有那么重要

# cat bkp_list.log| grep -E '*2021.*YES'| awk -F[==-] 'cnt[$8]++{if (cnt[$8]>1) print prev=$0;next}' |awk -F[==] '{print $8}' 

谢谢

如果您想在列中使用“是”进行筛选,您可以使用块前的连续表达式进行筛选

$ cat file
=    FULL     ==   20210105   ==     2100     == ASR-FULL-20210105-2100 ==  NO
=    FULL     ==   20210202   ==     2100     == ASR-FULL-20210202-2100 ==  YES
=    FULL     ==   20210302   ==     2100     == ASR-FULL-20210302-2100 ==  YES
=    FULL     ==   20210406   ==     2100     == ASR-FULL-20210406-2100 ==  YES
=    FULL     ==   20210105   ==     2146     == DNS10_7-FULL-20210105-2146 ==  YES
=    FULL     ==   20210202   ==     2153     == DNS10_7-FULL-20210202-2153 ==  YES
=    FULL     ==   20210302   ==     2148     == DNS10_7-FULL-20210302-2148 ==  YES
=    FULL     ==   20210406   ==     2122     == DNS10_7-FULL-20210406-2122 ==  YES
=    FULL     ==   20210105   ==     2105     == execnet.0-FULL-20210105-2105 ==  YES
=    FULL     ==   20210202   ==     2106     == execnet.0-FULL-20210202-2106 ==  YES
=    FULL     ==   20210302   ==     2106     == execnet.0-FULL-20210302-2106 ==  YES
=    FULL     ==   20210406   ==     2105     == execnet.0-FULL-20210406-2105 ==  YES
=    FULL     ==   20210106   ==     0200     == Prtgadmin.0-FULL-20210106-0200 ==  YES
=    FULL     ==   20210105   ==     2216     == sandbox.0-FULL-20210105-2216 ==  YES
=    FULL     ==   20210202   ==     2227     == sandbox.0-FULL-20210202-2227 ==  YES
=    FULL     ==   20210406   ==     2152     == sandbox.0-FULL-20210406-2152 ==  YES
=    FULL     ==   20210105   ==     2236     == wwwp.0-FULL-20210105-2236 ==  YES
=    FULL     ==   20210202   ==     2249     == wwwp.0-FULL-20210202-2249 ==  YES
=    FULL     ==   20210105   ==     2259     == wwws.0-FULL-20210105-2259 ==  YES
=    FULL     ==   20210202   ==     2314     == wwws.0-FULL-20210202-2314 ==  YES
=    FULL     ==   20210105   ==     2259     == webhost.0-FULL-20210105-2259 ==  YES
**注:我将第一行“是”更改为“否”,以检查行为是否正确


无论如何,如果您需要执行任何其他特殊筛选,如检查年份,请指定

如果您要在列中使用“是”进行筛选,您可以使用块前的连续表达式进行筛选

$ cat file
=    FULL     ==   20210105   ==     2100     == ASR-FULL-20210105-2100 ==  NO
=    FULL     ==   20210202   ==     2100     == ASR-FULL-20210202-2100 ==  YES
=    FULL     ==   20210302   ==     2100     == ASR-FULL-20210302-2100 ==  YES
=    FULL     ==   20210406   ==     2100     == ASR-FULL-20210406-2100 ==  YES
=    FULL     ==   20210105   ==     2146     == DNS10_7-FULL-20210105-2146 ==  YES
=    FULL     ==   20210202   ==     2153     == DNS10_7-FULL-20210202-2153 ==  YES
=    FULL     ==   20210302   ==     2148     == DNS10_7-FULL-20210302-2148 ==  YES
=    FULL     ==   20210406   ==     2122     == DNS10_7-FULL-20210406-2122 ==  YES
=    FULL     ==   20210105   ==     2105     == execnet.0-FULL-20210105-2105 ==  YES
=    FULL     ==   20210202   ==     2106     == execnet.0-FULL-20210202-2106 ==  YES
=    FULL     ==   20210302   ==     2106     == execnet.0-FULL-20210302-2106 ==  YES
=    FULL     ==   20210406   ==     2105     == execnet.0-FULL-20210406-2105 ==  YES
=    FULL     ==   20210106   ==     0200     == Prtgadmin.0-FULL-20210106-0200 ==  YES
=    FULL     ==   20210105   ==     2216     == sandbox.0-FULL-20210105-2216 ==  YES
=    FULL     ==   20210202   ==     2227     == sandbox.0-FULL-20210202-2227 ==  YES
=    FULL     ==   20210406   ==     2152     == sandbox.0-FULL-20210406-2152 ==  YES
=    FULL     ==   20210105   ==     2236     == wwwp.0-FULL-20210105-2236 ==  YES
=    FULL     ==   20210202   ==     2249     == wwwp.0-FULL-20210202-2249 ==  YES
=    FULL     ==   20210105   ==     2259     == wwws.0-FULL-20210105-2259 ==  YES
=    FULL     ==   20210202   ==     2314     == wwws.0-FULL-20210202-2314 ==  YES
=    FULL     ==   20210105   ==     2259     == webhost.0-FULL-20210105-2259 ==  YES
**注:我将第一行“是”更改为“否”,以检查行为是否正确


无论如何,如果您需要执行任何其他特殊筛选,如检查年份,请指定

打印除最后一次匹配的
$8
子字符串,您可以使用此
awk

awk'
$NF!=“是”{next}
{
s=8美元
子(/-FULL-.*/,“”,s)
}
s==ps{
打印pval
}
{
ps=s
pval=8美元
}"档案"
ASR-FULL-20210105-2100
ASR-FULL-20210202-2100
ASR-FULL-20210302-2100
DNS10_7-FULL-20210105-2146
DNS10_7-FULL-20210202-2153
DNS10_7-FULL-20210302-2148
execnet.0-FULL-20210105-2105
execnet.0-FULL-20210202-2106
execnet.0-FULL-20210302-2106
沙箱0-FULL-20210105-2216
沙箱0-FULL-20210202-2227
wwwp.0-FULL-20210105-2236
wwws.0-FULL-20210105-2259
或一个班轮:

awk'$NF!=“是”{next}{s=$8;sub(/-FULL-.*/,“”,s)}s==ps{print pval}{ps=s;pval=$8}”文件

要打印除最后一次匹配外的所有
$8子字符串
,您可以使用此
awk

awk'
$NF!=“是”{next}
{
s=8美元
子(/-FULL-.*/,“”,s)
}
s==ps{
打印pval
}
{
ps=s
pval=8美元
}"档案"
ASR-FULL-20210105-2100
ASR-FULL-20210202-2100
ASR-FULL-20210302-2100
DNS10_7-FULL-20210105-2146
DNS10_7-FULL-20210202-2153
DNS10_7-FULL-20210302-2148
execnet.0-FULL-20210105-2105
execnet.0-FULL-20210202-2106
execnet.0-FULL-20210302-2106
沙箱0-FULL-20210105-2216
沙箱0-FULL-20210202-2227
wwwp.0-FULL-20210105-2236
wwws.0-FULL-20210105-2259
或一个班轮:

awk'$NF!=“是”{next}{s=$8;sub(/-FULL-.*/,“”,s)}s==ps{print pval}{ps=s;pval=$8}”文件
带GNU awk的gensub():

或使用任何awk:

$ tac file | awk '$NF!="YES"{next} {k=$8; sub(/-.*/,"",k)} seen[k]++{print $8}' | tac
ASR-FULL-20210105-2100
ASR-FULL-20210202-2100
ASR-FULL-20210302-2100
DNS10_7-FULL-20210105-2146
DNS10_7-FULL-20210202-2153
DNS10_7-FULL-20210302-2148
execnet.0-FULL-20210105-2105
execnet.0-FULL-20210202-2106
execnet.0-FULL-20210302-2106
sandbox.0-FULL-20210105-2216
sandbox.0-FULL-20210202-2227
wwwp.0-FULL-20210105-2236
wwws.0-FULL-20210105-2259
对于gensub(),使用GNU awk:

或使用任何awk:

$ tac file | awk '$NF!="YES"{next} {k=$8; sub(/-.*/,"",k)} seen[k]++{print $8}' | tac
ASR-FULL-20210105-2100
ASR-FULL-20210202-2100
ASR-FULL-20210302-2100
DNS10_7-FULL-20210105-2146
DNS10_7-FULL-20210202-2153
DNS10_7-FULL-20210302-2148
execnet.0-FULL-20210105-2105
execnet.0-FULL-20210202-2106
execnet.0-FULL-20210302-2106
sandbox.0-FULL-20210105-2216
sandbox.0-FULL-20210202-2227
wwwp.0-FULL-20210105-2236
wwws.0-FULL-20210105-2259


老兄,为什么你没有这条线==ASR-FULL-20210105-2100==YES?我没有得到逻辑,它是从另一个备份工具命令中提取出来的,用于列出备份列表。对不起,我不在,谢谢你。是的,你是对的,我的意思是在我的过滤器中有两个连续的“=”。我会同时使用你的建议。@anubhava你是什么意思?它在第一线ASR-FULL-20210105-2100==对不起,我是说n-1个旧匹配(n-1个顶级匹配)我已经编辑了OP。谢谢大家的帮助。不,我从来没有暗示过任何人都应该暗地里弄明白。这是我在帖子的第一个版本上的错误,已经添加了更正。伙计,为什么你没有这行==ASR-FULL-20210105-2100==YES?我没有得到逻辑,它是从另一个备份工具命令中提取出来的,用于列出备份列表。对不起,我不在,谢谢你。是的,你是对的,我的意思是在我的过滤器中有两个连续的“=”。我会同时使用你的建议。@anubhava你是什么意思?它在第一线ASR-FULL-20210105-2100==对不起,我是说n-1个旧匹配(n-1个顶级匹配)我已经编辑了OP。谢谢大家的帮助。不,我从来没有暗示过任何人都应该暗地里弄明白。这是我在帖子的第一版上犯的错误,已经添加了更正。这一年真的没那么重要。我认为你的方法很好,因为我不再需要OP命令中的awk和grep部分。但是我仍然需要为每个vm FULL-*显示除最后一次事件之外的所有事件。(即3个中的2个previous)我们就快到了:“`` awk'$NF==”是“{print$(NF-2)}”file.log `` awk-F[=-]'cnt[$1]+{if(cnt[$1]>1)print prev=$0;next}``顺便说一句,前面的cnt语法没有打印最前面的n-1次。所以我还没到那一年似乎真的没那么重要。我认为你的方法很好,因为我不再需要OP命令中的awk和grep部分。但是我仍然需要为每个vm FULL-*显示除最后一次事件之外的所有事件。(即3个中的2个previous)我们就快到了:“`` awk'$NF==”是“{print$(NF-2)}”file.log `` awk-F[=-]'cnt[$1]+{if(cnt[$1]>1)print prev=$0;next}``顺便说一句,前面的cnt语法没有打印最前面的n-1次。所以我还没到那里,这看起来很可怕。我正在一行中尝试,我希望它可以作为一行运行
awk'/2021.*YES/{next}{s=$8;sub(/-FULL-.*/,“”,s)}s==ps{print pval}{ps=s;pval=$8}文件
是一行单元格,这正是我想要的。您可以添加一个不带2021的变量。我将把它作为有效答案。非常感谢你啊*是/{next}{s=$8;sub(/-FULL-.*/,“”,s)}s==ps{print pval}{ps=s;pval=$8}非常感谢先生!!太棒了,阿努巴瓦。我正在一行中尝试,我希望它可以作为一行运行
awk'/2021.*YES/{next}{s=$8;sub(/-FULL-.*/,“”,s)}s==ps{print pval}{ps=s;pval=$8}文件
是一行单元格,这正是我想要的。您可以添加一个不带2021的变量。我将把它作为有效答案。非常感谢你啊*是/{next}{s=$8;sub(/-FULL-.*/,“”,s)}s==ps{print pval}{ps=s;pval=$8}非常感谢先生!!
$ awk ' $NF == "NO" { print $(NF-2) }' file
ASR-FULL-20210105-2100
$
$ tac file | awk '$NF=="YES" && seen[gensub(/-.*/,"",1,$8)]++{print $8}' | tac
ASR-FULL-20210105-2100
ASR-FULL-20210202-2100
ASR-FULL-20210302-2100
DNS10_7-FULL-20210105-2146
DNS10_7-FULL-20210202-2153
DNS10_7-FULL-20210302-2148
execnet.0-FULL-20210105-2105
execnet.0-FULL-20210202-2106
execnet.0-FULL-20210302-2106
sandbox.0-FULL-20210105-2216
sandbox.0-FULL-20210202-2227
wwwp.0-FULL-20210105-2236
wwws.0-FULL-20210105-2259
$ tac file | awk '$NF!="YES"{next} {k=$8; sub(/-.*/,"",k)} seen[k]++{print $8}' | tac
ASR-FULL-20210105-2100
ASR-FULL-20210202-2100
ASR-FULL-20210302-2100
DNS10_7-FULL-20210105-2146
DNS10_7-FULL-20210202-2153
DNS10_7-FULL-20210302-2148
execnet.0-FULL-20210105-2105
execnet.0-FULL-20210202-2106
execnet.0-FULL-20210302-2106
sandbox.0-FULL-20210105-2216
sandbox.0-FULL-20210202-2227
wwwp.0-FULL-20210105-2236
wwws.0-FULL-20210105-2259