Bash awk检查文本文件中标题行的顺序
在下面的Bash awk检查文本文件中标题行的顺序,bash,awk,Bash,Awk,在下面的bash中,我试图使用awk来验证标题的顺序在选项卡分隔的文件之间是否完全相同(key具有字段和文本文件的顺序,通常在目录中为3) 如果顺序正确或在文件之间找到匹配项,则打印文件名具有预期的字段顺序,但如果文件之间的顺序不匹配,则打印文件名会导致“$i的顺序不正确”,其中$i是使用键作为顺序的字段。谢谢:) 键 Index Chr Start End Ref Alt Inheritance Score file1.txt Index Chr Start End Ref
bash
中,我试图使用awk
来验证标题的顺序在选项卡分隔的文件之间是否完全相同(key
具有字段和文本文件的顺序,通常在目录中为3)
如果顺序正确或在文件之间找到匹配项,则打印文件名
具有预期的字段顺序,但如果文件之间的顺序不匹配,则打印文件名
会导致“$i的顺序不正确”,其中$i
是使用键
作为顺序的字段。谢谢:)
键
Index Chr Start End Ref Alt Inheritance Score
file1.txt
Index Chr Start End Ref Alt Inheritance Score
1 1 10 100 A - . 2
Index Chr Start End Ref Alt Inheritance
1 1 10 100 A - . 2
2 1 20 100 A - . 5
Index Chr Start End Ref Alt Inheritance
1 1 10 100 A - . 2
2 1 20 100 A - . 5
3 1 75 100 A - . 2
4 1 25 100 A - . 5
/home/cmccabe/Desktop/validate/file1.txt has expected order of fields
/home/cmccabe/Desktop/validate/file2.txt order of Score is not correct
/home/cmccabe/Desktop/validate/file3.txt order of Score is not correct
file2.txt
Index Chr Start End Ref Alt Inheritance Score
1 1 10 100 A - . 2
Index Chr Start End Ref Alt Inheritance
1 1 10 100 A - . 2
2 1 20 100 A - . 5
Index Chr Start End Ref Alt Inheritance
1 1 10 100 A - . 2
2 1 20 100 A - . 5
3 1 75 100 A - . 2
4 1 25 100 A - . 5
/home/cmccabe/Desktop/validate/file1.txt has expected order of fields
/home/cmccabe/Desktop/validate/file2.txt order of Score is not correct
/home/cmccabe/Desktop/validate/file3.txt order of Score is not correct
file3.txt
Index Chr Start End Ref Alt Inheritance Score
1 1 10 100 A - . 2
Index Chr Start End Ref Alt Inheritance
1 1 10 100 A - . 2
2 1 20 100 A - . 5
Index Chr Start End Ref Alt Inheritance
1 1 10 100 A - . 2
2 1 20 100 A - . 5
3 1 75 100 A - . 2
4 1 25 100 A - . 5
/home/cmccabe/Desktop/validate/file1.txt has expected order of fields
/home/cmccabe/Desktop/validate/file2.txt order of Score is not correct
/home/cmccabe/Desktop/validate/file3.txt order of Score is not correct
awk
for f in /home/cmccabe/Desktop/validate/*.txt ; do
bname=`basename $f`
awk '
FNR==NR {
order=(awk '!seen[$0]++ {lines[i++]=$0}
END {for (i in lines) if (seen[lines[i]]==1) print lines[i]})'
k=(awk '!seen[$0]++ {lines[i++]=$0}
END {for (i in lines) if (seen[lines[i]]==1) print lines[i]})'
if($order==$k) print FILENAME " has expected order of fields"
else
print FILENAME " order of $i is not correct"
}' key $f
done
所需输出
Index Chr Start End Ref Alt Inheritance Score
1 1 10 100 A - . 2
Index Chr Start End Ref Alt Inheritance
1 1 10 100 A - . 2
2 1 20 100 A - . 5
Index Chr Start End Ref Alt Inheritance
1 1 10 100 A - . 2
2 1 20 100 A - . 5
3 1 75 100 A - . 2
4 1 25 100 A - . 5
/home/cmccabe/Desktop/validate/file1.txt has expected order of fields
/home/cmccabe/Desktop/validate/file2.txt order of Score is not correct
/home/cmccabe/Desktop/validate/file3.txt order of Score is not correct
根据这些输入,您可以执行以下操作:
awk 'FNR==NR{hn=split($0,header); next}
FNR==1 {n=split($0,fh)
for(i=1;i<=hn; i++)
if (fh[i]!=header[i]) {
printf "%s: order of %s is not correct\n" ,FILENAME, header[i]
next}
if (hn==n)
print FILENAME, "has expected order of fields"
else
print FILENAME, "has extra fields"
next
}' key f{1..3}
上面使用GNU awk表示nextfile
以提高效率。对于其他AWK,只需删除该语句并接受将读取的每个文件的全部内容
您的示例中没有包含这样一种情况:文件中出现标题,但键中不存在标题,因此我假设不会发生这种情况,因此您不需要脚本来处理它。所有文件都只有一行吗?不,很抱歉,每个文本文件中有多行,长度可能会有所不同。。。。我会更新帖子,但是在每个文件的文本文件中,标题行总是1。键只有一行。谢谢:)。你把awk和壳牌搞混了。Awk不是外壳。您可以从Arnold Robbins的《有效的awk编程》第四版中学习awk。非常感谢你们的帮助:)我正在学习awk并阅读有效的awk编程,这很有帮助,但有很多:),更不用说他关于shell的伟大著作了。谢谢:)。