Awk 如果第二行短于指定长度,则删除两行
我有一个文件,看起来像:Awk 如果第二行短于指定长度,则删除两行,awk,sed,Awk,Sed,我有一个文件,看起来像: >NB501365:508:HJF2HBGXF:1:12102:14401:14957 1:N:0:CTTGTA GAATCTTCATGTGAGGAACAGAATTCAGC >NB501365:508:HJF2HBGXF:1:12102:17671:14957 1:N:0:CTTGTA ATCGTG >NB501365:508:HJF2HBGXF:1:12102:14401:14957 1:N:0:CTTGTA CCCCCCCCCGGGGCTCGGGG
>NB501365:508:HJF2HBGXF:1:12102:14401:14957 1:N:0:CTTGTA
GAATCTTCATGTGAGGAACAGAATTCAGC
>NB501365:508:HJF2HBGXF:1:12102:17671:14957 1:N:0:CTTGTA
ATCGTG
>NB501365:508:HJF2HBGXF:1:12102:14401:14957 1:N:0:CTTGTA
CCCCCCCCCGGGGCTCGGGGGGGCTGG
每个桶都以
符号开头。如果每个存储桶的第二行长度=
大于15,我希望保留文件的存储桶
因此,我的理想输出应该是:
>NB501365:508:HJF2HBGXF:1:12102:14401:14957 1:N:0:CTTGTA
GAATCTTCATGTGAGGAACAGAATTCAGC
>NB501365:508:HJF2HBGXF:1:12102:14401:14957 1:N:0:CTTGTA
CCCCCCCCCGGGGCTCGGGGGGGCTGG
我在Stackoverflow中查看了不同的帖子,但仍然没有成功。提前谢谢你你能试试下面的内容吗 说明:添加上述内容的详细说明
awk ' ##Starting awk program from here.
/^>/{ ##Checking condition if line starts from > then do following.
val=$0 ##Creating val which has current line.
next ##next will skip all further statements from here.
}
length($0)>=15{ ##Checking condition if current line length is greater than 15 then do following.
print val ORS $0 ##Printing val ORS(new line) and current line here.
val="" ##Nullify val here.
}
' Input_file ##Mentioning Input_file name here
假设:
- 所有数据线成对出现
- 配对的第一行以第一列中的
开头
$ cat stuff.dat
>NB501365:508:HJF2HBGXF:1:12102:14401:14957 1:N:0:CTTGTA
GAATCTTCATGTGAGGAACAGAATTCAGC
>NB501365:508:HJF2HBGXF:1:12102:17671:14957 1:N:0:CTTGTA
ATCGTG
>NB501365:508:HJF2HBGXF:1:12102:14401:14957 1:N:0:CTTGTA
CCCCCCCCCGGGGCTCGGGGGGGCTGG
一个awk
解决方案:
$ awk '
/^>/ { line1=$0
getline
line2=$0
if (length(line2) >= 15)
{ printf "%s\n%s\n", line1, line2}
}
' stuff.dat
>NB501365:508:HJF2HBGXF:1:12102:14401:14957 1:N:0:CTTGTA
GAATCTTCATGTGAGGAACAGAATTCAGC
>NB501365:508:HJF2HBGXF:1:12102:14401:14957 1:N:0:CTTGTA
CCCCCCCCCGGGGCTCGGGGGGGCTGG
这可能适用于您(GNU-sed):
如果行以
开头追加下一行,然后如果与追加行中的15个或更多字符匹配的regexp失败,则删除该对
$ awk '
/^>/ { line1=$0
getline
line2=$0
if (length(line2) >= 15)
{ printf "%s\n%s\n", line1, line2}
}
' stuff.dat
>NB501365:508:HJF2HBGXF:1:12102:14401:14957 1:N:0:CTTGTA
GAATCTTCATGTGAGGAACAGAATTCAGC
>NB501365:508:HJF2HBGXF:1:12102:14401:14957 1:N:0:CTTGTA
CCCCCCCCCGGGGCTCGGGGGGGCTGG
sed -E '/^>/{N;/\n.{15}/!d}' file