Awk 对列进行分类并计算列中的警告数
我有一个名为out.txt的文件,如下所示:Awk 对列进行分类并计算列中的警告数,awk,Awk,我有一个名为out.txt的文件,如下所示: Statement 1 Statement 2 Statement 3 Statement 4 The declaration is not done / Exp / * / This is expected The declaration is starting/started / St / * / This is n
Statement 1 Statement 2 Statement 3 Statement 4
The declaration is not done / Exp / * / This is expected
The declaration is starting/started / St / * / This is not expected
The declaration is not yet designed / Yt / & / This is a major one
The declaration is confirmed / Exp / * / This is okay
The declaration is not confirmed / Ntp / & / This is a major issue
Out:
Warnings:
Exp : 2
St : 1
Total : 3
Errors:
Yt : 1
Ntp: 1
Total :2
我需要从第3列(报表3)中总结和分类,如果它是*作为警告,如果它是&则是错误,如下所示:
Statement 1 Statement 2 Statement 3 Statement 4
The declaration is not done / Exp / * / This is expected
The declaration is starting/started / St / * / This is not expected
The declaration is not yet designed / Yt / & / This is a major one
The declaration is confirmed / Exp / * / This is okay
The declaration is not confirmed / Ntp / & / This is a major issue
Out:
Warnings:
Exp : 2
St : 1
Total : 3
Errors:
Yt : 1
Ntp: 1
Total :2
我尝试了以下代码,但没有得到确切的输出:
#!/bin/bash
echo " " ;
File="out.txt"
for z in out.txt;
do
if grep -q "&" $z/"$File"; then
echo "$z:";
awk -F' / '
{ a[$2]++ }
END{ for(j in a){ print j, a[j]; s=s+a[j] };
print "Total :", s}' out.txt
else
echo "$z:";
done
EDIT2:由于OP确认没有用于错误的关键字,因此应通过第二行最后一个字段中的&
关键字决定,然后尝试以下操作
awk -F'/' '
match($0,/[[:space:]]+\/[^/]*[[:space:]]+\//){
val=substr($0,RSTART,RLENGTH)
gsub(/[[:space:]]+|\//,"",val)
str=$(NF-1)
gsub(/ +/,"",str)
if(str=="&"){
countEr[val]++
}
else{
countSu[val]++
}
val=str=""
}
END{
print "Out:" ORS "Warings:"
for(i in countSu){
print "\t"i,countSu[i]
sumSu+=countSu[i]
}
print "Total:"sumSu
print "Errors:"
for(i in countEr){
print "\t"i,countEr[i]
sumEr+=countEr[i]
}
print "Total:"sumEr
}' Input_file
编辑:通用解决方案,其中可以在变量中指定所有错误名称,然后我们不需要像我以前的解决方案那样手动放置所有条件。基于您所展示的仅使用GNU
awk编写和测试的样本,请尝试以下内容
awk -v errors="Ntp,Yt" '
BEGIN{
num=split(errors,arr,",")
for(i=1;i<=num;i++){
errorVal[arr[i]]
}
}
match($0,/[[:space:]]+\/[^/]*[[:space:]]+\//){
val=substr($0,RSTART,RLENGTH)
gsub(/[[:space:]]+|\//,"",val)
if(val in errorVal){
countEr[val]++
}
else{
countSu[val]++
}
val=""
}
END{
print "Out:" ORS "Warings:"
for(i in countSu){
print "\t"i,countSu[i]
sumSu+=countSu[i]
}
print "Total:"sumSu
print "Errors:"
for(i in countEr){
print "\t"i,countEr[i]
sumEr+=countEr[i]
}
print "Total:"sumEr
}' Input_file
另一个gawk替代方案——依赖于gawk的“真正的多维数组”:
$cat tst.awk:
BEGIN {
FS="[[:blank:]]/[[:blank:]]"
OFS=" : "
}
FNR>1{
gsub(/[[:blank:]]/, "", $2)
gsub(/[[:blank:]]/, "", $3)
a[$3][$2]++
}
END {
#PROCINFO["sorted_in"]="@ind_str_desc"
print "Out" OFS
for(i in a) {
print (i=="*"?"Warnings":"Errors") OFS
t=0
for(j in a[i]) {
print "\t" j, a[i][j]
t+=a[i][j]
}
print "Total", t
t=0
}
}
gawk-tst.awk myFile
导致:
Out :
Warnings :
St : 1
Exp : 2
Total : 3
Errors :
Ntp : 1
Yt : 1
Total : 2
错误列表将不仅仅是两个**(Ntp和yt)**,还可能出现其他错误。如果在第三列**(声明三)**中有-,则被确定为错误。@Rama,好的,请尝试我的EDIT2解决方案,然后让我知道?一个疑问。。如果我们在语句一和语句二列之间使用|(管道符号)作为分隔符会怎么样?@Rama,那么您可能需要将字段分隔符更改为-F'|'
(未测试,因为解决方案仅根据所示示例编写)并更改匹配($0,/[:space:]+\\\\\[^/]*[:space:]+\\\\\\/)
和更改gsub(/[:space:]]+|\|/,“”,val)
在我的EDIT2解决方案中。感谢您建议的更改,它与|符号一起工作。