Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/neo4j/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用biopython从gb文件提取数据_Python_Biopython - Fatal编程技术网

使用biopython从gb文件提取数据

使用biopython从gb文件提取数据,python,biopython,Python,Biopython,我有一个gb文件,我需要从文件中提取一些特定的特征:蛋白质编码基因的名称和大小 LOCUS NC_008137 15318 bp DNA linear MAM 15-APR-2009 DEFINITION Phalanger interpositus mitochondrion, complete genome. ACCESSION NC_008137 VERSION NC_008137.1 GI:108793518 DBL

我有一个gb文件,我需要从文件中提取一些特定的特征:蛋白质编码基因的名称和大小

LOCUS       NC_008137              15318 bp    DNA     linear   MAM 15-APR-2009
DEFINITION  Phalanger interpositus mitochondrion, complete genome.
ACCESSION   NC_008137
VERSION     NC_008137.1  GI:108793518
DBLINK      Project: 17043
KEYWORDS    .
SOURCE      mitochondrion Phalanger interpositus (Stein's cuscus)
  ORGANISM  Phalanger interpositus
            Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
            Mammalia; Metatheria; Diprotodontia; Phalangeridae; Phalanger.
REFERENCE   1  (bases 1 to 15318)
  AUTHORS   Munemasa,M., Nikaido,M., Donnellan,S., Austin,C.C., Okada,N. and
            Hasegawa,M.
  TITLE     Phylogenetic analysis of diprotodontian marsupials based on
            complete mitochondrial genomes
  JOURNAL   Genes Genet. Syst. 81 (3), 181-191 (2006)
   PUBMED   16905872
REFERENCE   2  (bases 1 to 15318)
  CONSRTM   NCBI Genome Project
  TITLE     Direct Submission
  JOURNAL   Submitted (12-JUN-2006) National Center for Biotechnology
            Information, NIH, Bethesda, MD 20894, USA
REFERENCE   3  (bases 1 to 15318)
  AUTHORS   Munemasa,M., Nikaido,M., Donnellan,S., Austin,C.C., Okada,N. and
            Hasegawa,M.
  TITLE     Direct Submission
  JOURNAL   Submitted (08-NOV-2005) Tokyo Institute of Technology, Graduate
            School of Bioscience and Biotechnology; Nagatsuta-cho 4259-B-21,
            Midori-ku, Kanagawa 226-8501, Japan
COMMENT     REVIEWED REFSEQ: This record has been curated by NCBI staff. The
            reference sequence was derived from AB241057.
            Genome sequence lacks part of non-coding region.
            COMPLETENESS: full length.
FEATURES             Location/Qualifiers
     source          1..15318
                     /organism="Phalanger interpositus"
                     /organelle="mitochondrion"
                     /mol_type="genomic DNA"
                     /db_xref="taxon:356347"
                     /tissue_type="liver"
                     /common="Stein's cuscus"
     tRNA            1..69
                     /product="tRNA-Phe"
     rRNA            72..1018
                     /product="s-rRNA"
                     /note="12S ribosomal RNA"
     tRNA            1020..1088
                     /product="tRNA-Val"
     rRNA            1089..2653
                     /product="l-rRNA"
                     /note="16S ribosomal RNA"
     tRNA            2654..2727
                     /product="tRNA-Leu"
                     /codon_recognized="UUR"
     gene            2729..3685
                     /gene="ND1"
                     /db_xref="GeneID:4117948"
     CDS             2729..3685
                     /gene="ND1"
                     /codon_start=1
                     /transl_table=2
                     /product="NADH dehydrogenase subunit 1"
                     /protein_id="YP_637062.1"
                     /db_xref="GI:108793519"
                     /db_xref="GeneID:4117948"
                     /translation="MFIINLLMYIIPILLAIAFLTLVERKALGYMQFRKGPNVVGPYG
                     LLQPIADGMKLFSKEPLQPVTSSTTMFIIAPTLALTLSLTMWTPLPMPHSLIDLNLGL
                     LFILALSGLSVYSILWSGWASNSKYALMGALRAVAQTISYEVTLAIILLSIMLINGSF
                     TLKNLITTQENMWLIITTWPLVMMWYVSTLAETNRAPLDLTEGESELVSGFNVEYAAG
                     PFAMFFLAEYANIMLMNAMTTILFLGSSINHNFTHLNTLSFMTKTIALTFLFLWVRAS
                     YPRFRYDQLMHLLWKNFLPMTLAMCLWFISIPIALSCIPPQI"
     misc_feature    2729..3682
                     /gene="ND1"
                     /note="NADH dehydrogenase; Region: NADHdh; cl00469"
                     /db_xref="CDD:186018"
     tRNA            3686..3751
                     /product="tRNA-Ile"
     tRNA            complement(3750..3821)
                     /product="tRNA-Gln"
     tRNA            3821..3878
                     /product="tRNA-Met"
     gene            3889..4932
                     /gene="ND2"
                     /db_xref="GeneID:4117949"
     CDS             3889..4932
                     /gene="ND2"
                     /codon_start=1
                     /transl_table=2
                     /product="NADH dehydrogenase subunit 2"
                     /protein_id="YP_637063.1"
                     /db_xref="GI:108793520"
                     /db_xref="GeneID:4117949"
                     /translation="MSPYILLIMLTSLLLGTSLTLFSNHWLTAWMGLEINTLAIIPMM
                     TYPNHPRATESAIKYFLTQSTASMMLMFAIINNAWMTNQWTLLQTSDQTSSTIMTLAL
                     AMKLGLAPFHFWVPEVTQGIPLTSGMILLTWQKIAPTSLMYQISPSLNMKILVMLALL
                     STILGGWGGLNQTHMRKILAYSSIAHMGWMTIIILINPTLTLLNLAIYITTTLTLFLA
                     LNHSSITKIKSLANLWNKSSSMTIVIALTLLSLGGLPPLTGFMPKWLILQELITYNNI
                     ATATMMAMSALLNLFFYMRIIYTTTLTMPPSINNSKLQWPHPQTKTTNIIPLLTIISS
                     FLLPLTPLSITLS"
我使用了seqFeature和子功能,但不起作用

我应该从这个文件中得到(ND1和2729..3685,ND2和3889..4932,…如果还有更多的话)


我是biopython的新手,希望您能帮助我如何做到这一点

您发布的genbank文件不完整,缺少部分,并且没有
/
终止行。然后,解析器在尝试读取它时陷入困境

我从中获得了线粒体的正确文件。
然后(py3k代码):

>>> 
>>> from Bio import SeqIO
>>> arch = "C:/code/NC_008137.gbk"
>>> record = SeqIO.parse(arch, "genbank")
>>> rec = next(record)                       # there is only one record
>>> for f in rec.features:
    if f.type == 'gene':
        print(f.qualifiers['gene'], f.location)


['ND1'] [2728:3685]
['ND2'] [3888:4932]
['COX1'] [5365:6919]
['COX2'] [7052:7737]
['ATP8'] [7798:8005]
['ATP6'] [7959:8640]
['COX3'] [8639:9423]
['ND3'] [9488:9837]
['ND4L'] [9906:10203]
['ND4'] [10196:11574]
['ND5'] [11773:13582]
['ND6'] [13578:14082]
['CYTB'] [14155:15301]
>>>