用于过滤三次或三次以上数据过滤器的awk命令
我有一个数据集选项卡,如下所示:用于过滤三次或三次以上数据过滤器的awk命令,awk,Awk,我有一个数据集选项卡,如下所示: A B C D 1 aaa 1 2 1 aaa 3 4 1 aaa 5 6 1 bbb 7 8 1 ccc 9 1 1 ccc 2 3 1 ddd 4 5 1 ddd 6 7 1 ddd 8 9 1 ddd 1 2 期望输出: A B C D 1 aaa 1 2 1 aaa 3 4 1 aaa 5 6 1 ddd 4 5 1 ddd 6 7 1 ddd 8 9 1 ddd 1 2 1 ddd 1 2 1
A B C D
1 aaa 1 2
1 aaa 3 4
1 aaa 5 6
1 bbb 7 8
1 ccc 9 1
1 ccc 2 3
1 ddd 4 5
1 ddd 6 7
1 ddd 8 9
1 ddd 1 2
期望输出:
A B C D
1 aaa 1 2
1 aaa 3 4
1 aaa 5 6
1 ddd 4 5
1 ddd 6 7
1 ddd 8 9
1 ddd 1 2
1 ddd 1 2
1 aaa 1 2
1 aaa 3 4
1 aaa 5 6
1 ccc 2 3
1 ddd 4 5
1 ddd 6 7
1 ddd 8 9
1 ddd 1 2
我试过这个:
awk '++a[$2]>3' test.tsv test.tsv > test-2.tsv
不需要的输出:
A B C D
1 aaa 1 2
1 aaa 3 4
1 aaa 5 6
1 ddd 4 5
1 ddd 6 7
1 ddd 8 9
1 ddd 1 2
1 ddd 1 2
1 aaa 1 2
1 aaa 3 4
1 aaa 5 6
1 ccc 2 3
1 ddd 4 5
1 ddd 6 7
1 ddd 8 9
1 ddd 1 2
您可以尝试此2通awk:
awk-F'\t''FNR==NR{freq[$2]+;next}freq[$2]>=3'test.tsv{,}
1 aaa 12
1 aaa 3 4
1 aaa 5 6
1 ddd 4 5
1 ddd 6 7
1 ddd 8 9
1 ddd 1 2
对于您展示的样本(单次输入_文件),您可以尝试使用GNUawk
编写并测试以下内容吗
awk '
BEGIN{ FS=OFS="\t" }
FNR==1{
print
next
}
{
count[$2]++
line[$2]=(line[$2]?line[$2] ORS:"")$0
}
END{
for(i in count){
if(count[i]>=3){
print line[i]
}
}
}' Input_file
说明:添加上述内容的详细说明
awk ' ##Starting of awk program from here.
BEGIN{ FS=OFS="\t" } ##Starting BEGIN section of this program from here.
##Setting FS and OFS as tab here.
FNR==1{ ##Checking condition if this is first line then do following.
print ##Printing the current line here.
next ##next will skip all further statements from here.
}
{
count[$2]++ ##Creating count with index of 2nd field and keep increasing its count here.
line[$2]=(line[$2]?line[$2] ORS:"")$0
##Creating line array with index of 2nd field and keep adding lines to it with a new line.
}
END{ ##Starting END block of this program from here.
for(i in count){ ##Traversing through count array here.
if(count[i]>=3){ ##Checking condition if count with index of i value is greater than or equals to 3 then do following.
print line[i] ##Printing value of line.
}
}
}' Input_file ##Mentioning Input_file name here.