Awk 如果第二行短于指定长度,则删除两行

Awk 如果第二行短于指定长度,则删除两行,awk,sed,Awk,Sed,我有一个文件,看起来像: >NB501365:508:HJF2HBGXF:1:12102:14401:14957 1:N:0:CTTGTA GAATCTTCATGTGAGGAACAGAATTCAGC >NB501365:508:HJF2HBGXF:1:12102:17671:14957 1:N:0:CTTGTA ATCGTG >NB501365:508:HJF2HBGXF:1:12102:14401:14957 1:N:0:CTTGTA CCCCCCCCCGGGGCTCGGGG

我有一个文件,看起来像:

>NB501365:508:HJF2HBGXF:1:12102:14401:14957 1:N:0:CTTGTA
GAATCTTCATGTGAGGAACAGAATTCAGC
>NB501365:508:HJF2HBGXF:1:12102:17671:14957 1:N:0:CTTGTA
ATCGTG
>NB501365:508:HJF2HBGXF:1:12102:14401:14957 1:N:0:CTTGTA
CCCCCCCCCGGGGCTCGGGGGGGCTGG
每个桶都以
符号开头。如果每个存储桶的第二行长度
=
大于15,我希望保留文件的存储桶

因此,我的理想输出应该是:

>NB501365:508:HJF2HBGXF:1:12102:14401:14957 1:N:0:CTTGTA
GAATCTTCATGTGAGGAACAGAATTCAGC
>NB501365:508:HJF2HBGXF:1:12102:14401:14957 1:N:0:CTTGTA
CCCCCCCCCGGGGCTCGGGGGGGCTGG

我在Stackoverflow中查看了不同的帖子,但仍然没有成功。提前谢谢你

你能试试下面的内容吗

说明:添加上述内容的详细说明

awk '                  ##Starting awk program from here.
/^>/{                  ##Checking condition if line starts from > then do following.
  val=$0               ##Creating val which has current line.
  next                 ##next will skip all further statements from here.
}
length($0)>=15{        ##Checking condition if current line length is greater than 15 then do following.
  print val ORS $0     ##Printing val ORS(new line) and current line here.
  val=""               ##Nullify val here.
}
' Input_file           ##Mentioning Input_file name here
假设:

  • 所有数据线成对出现
  • 配对的第一行以第一列中的
    开头
样本数据:

$ cat stuff.dat
>NB501365:508:HJF2HBGXF:1:12102:14401:14957 1:N:0:CTTGTA
GAATCTTCATGTGAGGAACAGAATTCAGC
>NB501365:508:HJF2HBGXF:1:12102:17671:14957 1:N:0:CTTGTA
ATCGTG
>NB501365:508:HJF2HBGXF:1:12102:14401:14957 1:N:0:CTTGTA
CCCCCCCCCGGGGCTCGGGGGGGCTGG
一个
awk
解决方案:

$ awk '
/^>/ { line1=$0
       getline
       line2=$0
       if (length(line2) >= 15)
          { printf "%s\n%s\n", line1, line2}
     }
' stuff.dat
>NB501365:508:HJF2HBGXF:1:12102:14401:14957 1:N:0:CTTGTA
GAATCTTCATGTGAGGAACAGAATTCAGC
>NB501365:508:HJF2HBGXF:1:12102:14401:14957 1:N:0:CTTGTA
CCCCCCCCCGGGGCTCGGGGGGGCTGG
这可能适用于您(GNU-sed):

如果行以
开头追加下一行,然后如果与追加行中的15个或更多字符匹配的regexp失败,则删除该对

$ awk '
/^>/ { line1=$0
       getline
       line2=$0
       if (length(line2) >= 15)
          { printf "%s\n%s\n", line1, line2}
     }
' stuff.dat
>NB501365:508:HJF2HBGXF:1:12102:14401:14957 1:N:0:CTTGTA
GAATCTTCATGTGAGGAACAGAATTCAGC
>NB501365:508:HJF2HBGXF:1:12102:14401:14957 1:N:0:CTTGTA
CCCCCCCCCGGGGCTCGGGGGGGCTGG
sed -E '/^>/{N;/\n.{15}/!d}' file