Awk 对列进行分类并计算列中的警告数

Awk 对列进行分类并计算列中的警告数,awk,Awk,我有一个名为out.txt的文件,如下所示: Statement 1 Statement 2 Statement 3 Statement 4 The declaration is not done / Exp / * / This is expected The declaration is starting/started / St / * / This is n

我有一个名为out.txt的文件,如下所示:

Statement 1                        Statement 2  Statement 3    Statement 4
The declaration is not done         /   Exp     /   *       /  This is expected
The declaration is starting/started /   St      /   *       /  This is not expected
The declaration is not yet designed /   Yt      /   &       /  This is a major one
The declaration is confirmed        /   Exp     /   *       /  This is okay
The declaration is not confirmed    /   Ntp     /   &       /  This is a major issue
Out:
Warnings:
    Exp : 2
    St  : 1
Total : 3
Errors:
    Yt : 1
    Ntp: 1
Total :2
我需要从第3列(报表3)中总结和分类,如果它是*作为警告,如果它是&则是错误,如下所示:

Statement 1                        Statement 2  Statement 3    Statement 4
The declaration is not done         /   Exp     /   *       /  This is expected
The declaration is starting/started /   St      /   *       /  This is not expected
The declaration is not yet designed /   Yt      /   &       /  This is a major one
The declaration is confirmed        /   Exp     /   *       /  This is okay
The declaration is not confirmed    /   Ntp     /   &       /  This is a major issue
Out:
Warnings:
    Exp : 2
    St  : 1
Total : 3
Errors:
    Yt : 1
    Ntp: 1
Total :2
我尝试了以下代码,但没有得到确切的输出:

#!/bin/bash
echo " " ;
File="out.txt"
for z in out.txt;
do
if grep -q "&" $z/"$File"; then
echo "$z:";
awk -F' / ' 
     { a[$2]++ }
     END{ for(j in a){ print j, a[j]; s=s+a[j] };
 print "Total :", s}' out.txt
else 
echo "$z:";
done
EDIT2:由于OP确认没有用于错误的关键字,因此应通过第二行最后一个字段中的
&
关键字决定,然后尝试以下操作

awk -F'/' '
match($0,/[[:space:]]+\/[^/]*[[:space:]]+\//){
  val=substr($0,RSTART,RLENGTH)
  gsub(/[[:space:]]+|\//,"",val)
  str=$(NF-1)
  gsub(/ +/,"",str)
  if(str=="&"){
     countEr[val]++
  }
  else{
     countSu[val]++
  }
  val=str=""
}
END{
  print "Out:" ORS "Warings:"
  for(i in countSu){
     print "\t"i,countSu[i]
     sumSu+=countSu[i]
  }
  print "Total:"sumSu
  print "Errors:"
  for(i in countEr){
     print "\t"i,countEr[i]
     sumEr+=countEr[i]
  }
  print "Total:"sumEr
}' Input_file


编辑:通用解决方案,其中可以在变量中指定所有错误名称,然后我们不需要像我以前的解决方案那样手动放置所有条件。基于您所展示的仅使用GNU
awk编写和测试的样本,请尝试以下内容

awk -v errors="Ntp,Yt"  '
BEGIN{
  num=split(errors,arr,",")
  for(i=1;i<=num;i++){
     errorVal[arr[i]]
  }
}
match($0,/[[:space:]]+\/[^/]*[[:space:]]+\//){
  val=substr($0,RSTART,RLENGTH)
  gsub(/[[:space:]]+|\//,"",val)
  if(val in errorVal){
     countEr[val]++
  }
  else{
     countSu[val]++
  }
  val=""
}
END{
  print "Out:" ORS "Warings:"
  for(i in countSu){
     print "\t"i,countSu[i]
     sumSu+=countSu[i]
  }
  print "Total:"sumSu
  print "Errors:"
  for(i in countEr){
     print "\t"i,countEr[i]
     sumEr+=countEr[i]
  }
  print "Total:"sumEr
}'  Input_file

另一个gawk替代方案——依赖于gawk的“真正的多维数组”:
$cat tst.awk:

BEGIN {
  FS="[[:blank:]]/[[:blank:]]"
  OFS=" : "
}
FNR>1{
   gsub(/[[:blank:]]/, "", $2)
   gsub(/[[:blank:]]/, "", $3)
   a[$3][$2]++
}
END {
  #PROCINFO["sorted_in"]="@ind_str_desc"
  print "Out" OFS
  for(i in a) {
    print (i=="*"?"Warnings":"Errors") OFS
    t=0
    for(j in a[i]) {
      print "\t" j, a[i][j]
      t+=a[i][j]
    }
    print "Total", t
    t=0
  }
}
gawk-tst.awk myFile
导致:

Out :
Warnings :
        St : 1
        Exp : 2
Total : 3
Errors :
        Ntp : 1
        Yt : 1
Total : 2

错误列表将不仅仅是两个**(Ntp和yt)**,还可能出现其他错误。如果在第三列**(声明三)**中有-,则被确定为错误。@Rama,好的,请尝试我的EDIT2解决方案,然后让我知道?一个疑问。。如果我们在语句一和语句二列之间使用|(管道符号)作为分隔符会怎么样?@Rama,那么您可能需要将字段分隔符更改为
-F'|'
(未测试,因为解决方案仅根据所示示例编写)并更改
匹配($0,/[:space:]+\\\\\[^/]*[:space:]+\\\\\\/)
和更改
gsub(/[:space:]]+|\|/,“”,val)
在我的EDIT2解决方案中。感谢您建议的更改,它与|符号一起工作。