Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/linux/24.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何在Linux/Unix中查找与标识符匹配的字符序列?_Linux_Bash_Unix_Sed_Grep - Fatal编程技术网

如何在Linux/Unix中查找与标识符匹配的字符序列?

如何在Linux/Unix中查找与标识符匹配的字符序列?,linux,bash,unix,sed,grep,Linux,Bash,Unix,Sed,Grep,我有一个名为mytext.fasta的fasta文件 mytext.fasta >lcl|NW_001820834.1_gene_4 [locus_tag=SS1G_01081] [db_xref=GeneID:5493597] [partial=5',3'] [location=complement(<6452..>8801)] [gbkey=Gene] ATGCAATTGGCAGCAGTCCTAAGCCTCGTGGGCTTGGTTACGGCTCAATGTCCGTACGGAT

我有一个名为
mytext.fasta
的fasta文件

mytext.fasta

>lcl|NW_001820834.1_gene_4 [locus_tag=SS1G_01081] [db_xref=GeneID:5493597] [partial=5',3'] [location=complement(<6452..>8801)] [gbkey=Gene]
ATGCAATTGGCAGCAGTCCTAAGCCTCGTGGGCTTGGTTACGGCTCAATGTCCGTACGGATTTGACACAC
CACTTCAAAAGCGTGAATCTATTGATGCTCAAGCCAGTAGTTCTAGTTTCTTGAATCAATTCACAATTAA
CGATACCGATGCACACTTTACCACCGACGCAGGTGGGCCTATGCAAGAGGACACTAGTTTGAAAGCTGGG
>lcl|NW_001820834.1_gene_5 [locus_tag=SS1G_01082] [db_xref=GeneID:5493601] [partial=5',3'] [location=<9695..>10785] [gbkey=Gene]
ATGTTTTCCGGTCCCCAGAAACTTGGCAACGCCAAACAAAAATCAATTGGCCTCGCTTGTCACACAATTA
GTCCCCACGAAGCCTTGTACAAACTAGCCACTGGCTCGTCCCGGACCATTAGGGCAATGTTCAACAGAGA
>lcl|NW_001820834.1_gene_6 [locus_tag=SS1G_01083] [db_xref=GeneID:5494096] [partial=5',3'] [location=<12203..>15199] [gbkey=Gene]
ATGAGAGGCAAGCTTGGTGTCACAGTTGCTGCATTTGCGACGGCATTTCTAAATACGACACTTGCTCAAG
ACTCAACATCATCACAAGCGGATGCGGATACTACCACAAGTTATTGTCCCGTTTACACGCTCACAGCTTC
AGTTGATGCCAGCGCACCTATTATCCCAAACATCCACGATCCGCAGGCAATTAATCCACAAGATGTTTGT
CCGGGGTATACTGCATCCAATGTGAAGCGAACCTCTCACGGATTGACGGCTTCTCTGTCATTGGCTGGTG
相反,我想得到:

>lcl|NW_001820834.1_gene_5 [locus_tag=SS1G_01082] [db_xref=GeneID:5493601] [partial=5',3'] [location=<9695..>10785] [gbkey=Gene]
ATGTTTTCCGGTCCCCAGAAACTTGGCAACGCCAAACAAAAATCAATTGGCCTCGCTTGTCACACAATTA
GTCCCCACGAAGCCTTGTACAAACTAGCCACTGGCTCGTCCCGGACCATTAGGGCAATGTTCAACAGAGA
>lcl | NW_001820834.1_gene_5[locus_tag=SS1G_01082][db_xref=GeneID:5493601][partial=5',3'][location=10785][gbkey=gene]
ATGTTTTCGGTCCCAGAAACTTGGCAACGCCAACAAAATCAATTTGGCCTCGCTTGTCACACAATTA
GTCCACGCAAGCCTTGTACAACTAGCCATGCTCGTCCCGGACCATAGGCAATGTTCAAGA

如果您注意到,在这个文件中,每个序列都以
开头,所以我想在执行grep时获得序列的完整长度。如何完成此操作?

使用自定义的
RS
,使用
gnu awk
更容易:

awk -v RS='(^|\n)>' '/SS1G_01082/{print RT $0}' file

>lcl | NW_001820834.1_gene_5[locus_tag=SS1G_01082][db_xref=GeneID:5493601][partial=5',3'][location=10785][gbkey=gene]
ATGTTTTCGGTCCCAGAAACTTGGCAACGCCAACAAAATCAATTTGGCCTCGCTTGTCACACAATTA
GTCCACGCAAGCCTTGTACAACTAGCCATGCTCGTCCCGGACCATAGGCAATGTTCAAGA

@anbhava谢谢,但我不知道为什么在我做管道时这不起作用。我的命令如下:
esearch-db nuccore-q'SS1G_01082[gene]'| efilter-source refseq-molecular genomic | efetch-format gene_fasta | awk-v RS='/SS1G_01082/'
。最后一位(
awk-vrs='/SS1G_01082/'
)应该过滤掉所需的序列,但它给了我一切。if
esearch-db nuccore-q'SS1G_01082[基因]“| efilter-source refseq-molecular genomic | efetch-format gene_fasta
命令给出与上面所示完全相同的输出,即每个记录后都有一个空行,那么这个
awk
应该有效否,输出没有空行。我已经更新了我的问题。
awk -v RS='(^|\n)>' '/SS1G_01082/{print RT $0}' file
>lcl|NW_001820834.1_gene_5 [locus_tag=SS1G_01082] [db_xref=GeneID:5493601] [partial=5',3'] [location=<9695..>10785] [gbkey=Gene]
ATGTTTTCCGGTCCCCAGAAACTTGGCAACGCCAAACAAAAATCAATTGGCCTCGCTTGTCACACAATTA
GTCCCCACGAAGCCTTGTACAAACTAGCCACTGGCTCGTCCCGGACCATTAGGGCAATGTTCAACAGAGA