Awk 对列进行分类并计算列中的警告数_Awk

Awk 对列进行分类并计算列中的警告数

awk

Awk 对列进行分类并计算列中的警告数,awk,Awk,我有一个名为out.txt的文件，如下所示： Statement 1 Statement 2 Statement 3 Statement 4 The declaration is not done / Exp / * / This is expected The declaration is starting/started / St / * / This is n

我有一个名为out.txt的文件，如下所示：

Statement 1 Statement 2 Statement 3 Statement 4 The declaration is not done / Exp / * / This is expected The declaration is starting/started / St / * / This is not expected The declaration is not yet designed / Yt / & / This is a major one The declaration is confirmed / Exp / * / This is okay The declaration is not confirmed / Ntp / & / This is a major issue

Out: Warnings: Exp : 2 St : 1 Total : 3 Errors: Yt : 1 Ntp: 1 Total :2
我需要从第3列（报表3）中总结和分类，如果它是*作为警告，如果它是&则是错误，如下所示：

Statement 1 Statement 2 Statement 3 Statement 4 The declaration is not done / Exp / * / This is expected The declaration is starting/started / St / * / This is not expected The declaration is not yet designed / Yt / & / This is a major one The declaration is confirmed / Exp / * / This is okay The declaration is not confirmed / Ntp / & / This is a major issue

Out: Warnings: Exp : 2 St : 1 Total : 3 Errors: Yt : 1 Ntp: 1 Total :2
我尝试了以下代码，但没有得到确切的输出：

#!/bin/bash echo " " ; File="out.txt" for z in out.txt; do if grep -q "&" $z/"$File"; then echo "$z:"; awk -F' / ' { a[$2]++ } END{ for(j in a){ print j, a[j]; s=s+a[j] }; print "Total :", s}' out.txt else echo "$z:"; done
EDIT2:由于OP确认没有用于错误的关键字，因此应通过第二行最后一个字段中的
&
关键字决定，然后尝试以下操作

awk -F'/' ' match($0,/[[:space:]]+\/[^/]*[[:space:]]+\//){ val=substr($0,RSTART,RLENGTH) gsub(/[[:space:]]+|\//,"",val) str=$(NF-1) gsub(/ +/,"",str) if(str=="&"){ countEr[val]++ } else{ countSu[val]++ } val=str="" } END{ print "Out:" ORS "Warings:" for(i in countSu){ print "\t"i,countSu[i] sumSu+=countSu[i] } print "Total:"sumSu print "Errors:" for(i in countEr){ print "\t"i,countEr[i] sumEr+=countEr[i] } print "Total:"sumEr }' Input_file

编辑：通用解决方案，其中可以在变量中指定所有错误名称，然后我们不需要像我以前的解决方案那样手动放置所有条件。基于您所展示的仅使用GNU
awk编写和测试的样本，请尝试以下内容 awk -v errors="Ntp,Yt" ' BEGIN{ num=split(errors,arr,",") for(i=1;i<=num;i++){ errorVal[arr[i]] } } match($0,/[[:space:]]+\/[^/]*[[:space:]]+\//){ val=substr($0,RSTART,RLENGTH) gsub(/[[:space:]]+|\//,"",val) if(val in errorVal){ countEr[val]++ } else{ countSu[val]++ } val="" } END{ print "Out:" ORS "Warings:" for(i in countSu){ print "\t"i,countSu[i] sumSu+=countSu[i] } print "Total:"sumSu print "Errors:" for(i in countEr){ print "\t"i,countEr[i] sumEr+=countEr[i] } print "Total:"sumEr }' Input_file 另一个gawk替代方案——依赖于gawk的“真正的多维数组”： $cat tst.awk: BEGIN { FS="[[:blank:]]/[[:blank:]]" OFS=" : " } FNR>1{ gsub(/[[:blank:]]/, "", $2) gsub(/[[:blank:]]/, "", $3) a[$3][$2]++ } END { #PROCINFO["sorted_in"]="@ind_str_desc" print "Out" OFS for(i in a) { print (i=="*"?"Warnings":"Errors") OFS t=0 for(j in a[i]) { print "\t" j, a[i][j] t+=a[i][j] } print "Total", t t=0 } } gawk-tst.awk myFile 导致： Out : Warnings : St : 1 Exp : 2 Total : 3 Errors : Ntp : 1 Yt : 1 Total : 2 错误列表将不仅仅是两个**（Ntp和yt）**，还可能出现其他错误。如果在第三列**（声明三）**中有-，则被确定为错误。@Rama，好的，请尝试我的EDIT2解决方案，然后让我知道？一个疑问。。如果我们在语句一和语句二列之间使用|（管道符号）作为分隔符会怎么样？@Rama，那么您可能需要将字段分隔符更改为-F'|' （未测试，因为解决方案仅根据所示示例编写）并更改匹配（$0，/[:space:]+\\\\\[^/]*[:space:]+\\\\\\/）和更改gsub（/[:space:]]+|\|/，“”，val）在我的EDIT2解决方案中。感谢您建议的更改，它与|符号一起工作。