用于过滤三次或三次以上数据过滤器的awk命令

用于过滤三次或三次以上数据过滤器的awk命令,awk,Awk,我有一个数据集选项卡,如下所示: A B C D 1 aaa 1 2 1 aaa 3 4 1 aaa 5 6 1 bbb 7 8 1 ccc 9 1 1 ccc 2 3 1 ddd 4 5 1 ddd 6 7 1 ddd 8 9 1 ddd 1 2 期望输出: A B C D 1 aaa 1 2 1 aaa 3 4 1 aaa 5 6 1 ddd 4 5 1 ddd 6 7 1 ddd 8 9 1 ddd 1 2 1 ddd 1 2 1

我有一个数据集选项卡,如下所示:

A  B  C  D
1  aaa 1 2
1  aaa 3 4
1  aaa 5 6
1  bbb 7 8
1  ccc 9 1
1  ccc 2 3
1  ddd 4 5
1  ddd 6 7
1  ddd 8 9
1  ddd 1 2
期望输出:

A  B  C  D
1  aaa 1 2
1  aaa 3 4
1  aaa 5 6
1  ddd 4 5
1  ddd 6 7
1  ddd 8 9
1  ddd 1 2
1   ddd 1   2
1   aaa 1   2
1   aaa 3   4
1   aaa 5   6
1   ccc 2   3
1   ddd 4   5
1   ddd 6   7
1   ddd 8   9
1   ddd 1   2
我试过这个:

awk '++a[$2]>3' test.tsv test.tsv > test-2.tsv
不需要的输出:

A  B  C  D
1  aaa 1 2
1  aaa 3 4
1  aaa 5 6
1  ddd 4 5
1  ddd 6 7
1  ddd 8 9
1  ddd 1 2
1   ddd 1   2
1   aaa 1   2
1   aaa 3   4
1   aaa 5   6
1   ccc 2   3
1   ddd 4   5
1   ddd 6   7
1   ddd 8   9
1   ddd 1   2

您可以尝试此2通awk:

awk-F'\t''FNR==NR{freq[$2]+;next}freq[$2]>=3'test.tsv{,}
1 aaa 12
1 aaa 3 4
1 aaa 5 6
1 ddd 4 5
1 ddd 6 7
1 ddd 8 9
1 ddd 1 2
对于您展示的样本(单次输入_文件),您可以尝试使用GNU
awk
编写并测试以下内容吗

awk '
BEGIN{ FS=OFS="\t" }
FNR==1{
  print
  next
}
{
  count[$2]++
  line[$2]=(line[$2]?line[$2] ORS:"")$0
}
END{
  for(i in count){
    if(count[i]>=3){
       print line[i]
    }
  }
}' Input_file
说明:添加上述内容的详细说明

awk '                   ##Starting of awk program from here.
BEGIN{ FS=OFS="\t" }    ##Starting BEGIN section of this program from here.
                        ##Setting FS and OFS as tab here.
FNR==1{                 ##Checking condition if this is first line then do following.
  print                 ##Printing the current line here.
  next                  ##next will skip all further statements from here.
}
{
  count[$2]++           ##Creating count with index of 2nd field and keep increasing its count here.
  line[$2]=(line[$2]?line[$2] ORS:"")$0
                        ##Creating line array with index of 2nd field and keep adding lines to it with a new line.
}
END{                    ##Starting END block of this program from here.
  for(i in count){      ##Traversing through count array here.
    if(count[i]>=3){    ##Checking condition if count with index of i value is greater than or equals to 3 then do following.
       print line[i]    ##Printing value of line.
    }
  }
}' Input_file           ##Mentioning Input_file name here.