Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/python-2.7/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Parsing 用Python以正确的顺序从文件到字典进行解析_Parsing_Python 2.7_Dictionary_Biopython - Fatal编程技术网

Parsing 用Python以正确的顺序从文件到字典进行解析

Parsing 用Python以正确的顺序从文件到字典进行解析,parsing,python-2.7,dictionary,biopython,Parsing,Python 2.7,Dictionary,Biopython,我已经编写了一些代码来解析EMBL文件,并将文件的特定区域转储到字典中 字典的键与我想要捕获的特定区域的标签相关,每个键的值就是区域本身 然后,我创建了另一个函数,将字典的内容写入文本文件 但是,我发现文本文件包含的信息顺序与原始EMBL文件中的不同 我不明白它为什么这样做——是因为字典无序吗?有什么办法吗 from Bio import SeqIO s6633 = SeqIO.read("6633_seq.embl", "embl") def make_dict_realgenes(x):

我已经编写了一些代码来解析EMBL文件,并将文件的特定区域转储到字典中

字典的键与我想要捕获的特定区域的标签相关,每个键的值就是区域本身

然后,我创建了另一个函数,将字典的内容写入文本文件

但是,我发现文本文件包含的信息顺序与原始EMBL文件中的不同

我不明白它为什么这样做——是因为字典无序吗?有什么办法吗

from Bio import SeqIO

s6633 = SeqIO.read("6633_seq.embl", "embl")

def make_dict_realgenes(x):
    dict = {}
    for i in range(len(x.features)):
        if x.features[i].type == 'CDS':
            if 'hypothetical' not in x.features[i].qualifiers['product'][0]:
                try:
                    if x.features[i].location.strand == -1:
                        x1 = x.features[i].location.end
                        y1 = x1 + 30
                        dict[str(x.features[i].qualifiers['product'][0])] =\
                             str(x[x1:y1].seq.reverse_complement())
                    else:
                        x2 = x.features[i].location.start
                        y2 = x2 - 30
                        dict[x.features[i].qualifiers['product'][0]] =\
                             str(x[y2:x2].seq)
                except KeyError:
                    if x.features[i].location.strand == -1:
                        x1 = x.features[i].location.end
                        y1 = x1 + 30
                        dict[str(x.features[i].qualifiers['translation'][0])] =\
                             str(x[x1:y1].seq.reverse_complement())
                    else:
                        x2 = x.features[i].location.start
                        y2 = x2 - 30
                        dict[x.features[i].qualifiers['translation'][0]] =\
                             str(x[y2:x2].seq)
    return dict

def rbs_file(dict):
    list = []
    c = 0
    for k, v in dict.iteritems():
        list.append(">" + k + " " + str(c) + "\n" + v + "\n")
        c = c + 1

    f = open("out.txt", "w")
    a = 0
    for i in list:
        f.write(i)
        a = a + 1

    f.close()

要保留词典中的顺序,请使用
collections
中的
orderedict
。尝试将代码顶部更改为:

from collections import OrderedDict
from Bio import SeqIO

s6633 = SeqIO.read("6633_seq.embl", "embl")

def make_dict_realgenes(x):
    dict = OrderedDict()   
...

此外,如果您可以轻松重命名内置的“dict”,我建议不要覆盖它。

我稍微重构了您的代码,我建议在解析文件时按原样编写输出,而不是按顺序转发

from Bio import SeqIO


output = open("out.txt", "w")

for seq in SeqIO.parse("CP001187.embl", "embl"):
    for feature in seq.features:
        if feature.type == "CDS":
            qualifier = (feature.qualifiers.get("product") or
                         feature.qualifiers.get("translation"))[0]
            if "hypothetical" not in qualifier:
                if feature.location.strand == -1: 
                    x1 = feature.location.end
                    x2 = x1 + 30
                    sequence = seq[x1:x2].seq.reverse_complement()
                else:
                    x1 = feature.location.start
                    x2 = x1 - 30
                    sequence = seq[x2:x1].seq

                output.write(">" + qualifier + "\n")
                output.write(str(sequence) + "\n")

                # You can always insert here to the OrderedDict anyway, e.g.
                # d[qualifier] = str(sequence)

output.close()
在python中,对于i-In-range(len(anywhere))是一种很好的选择


使用Biopython还有一种更干净的方法来输出序列。使用列表附加序号,而不是dict或ORDERDEDDICT:

from Bio.SeqRecord import SeqRecord

my_seqs = []

# Each time you generate a sequence, instead of writing to a file
# or inserting in dict, do this:
my_seqs.append(SeqRecord(sequence, id=qualifier, description=""))

 # Now you have the my_seqs, they can be writen in a single line:
SeqIO.write(my_seqs, "output.fas", "fasta")

是的,字典是无序的。如果订单很重要,请使用
列表
OrderedDict
。哇,谢谢!但是,
对于范围内的i(len(anything))
有什么问题吗?它不是pythonic。如果要循环元素,请对元素列表中的元素使用
。如果需要索引,请在enumerate(元素列表)中为i,元素使用
。还有其他原因,比如“元素的列表”如果是生成器,则不必有长度。谢谢-这就是我想要的。