Parsing 用Python以正确的顺序从文件到字典进行解析_Parsing_Python 2.7_Dictionary_Biopython

Parsing 用Python以正确的顺序从文件到字典进行解析

parsing python-2.7 dictionary

Parsing 用Python以正确的顺序从文件到字典进行解析,parsing,python-2.7,dictionary,biopython,Parsing,Python 2.7,Dictionary,Biopython,我已经编写了一些代码来解析EMBL文件，并将文件的特定区域转储到字典中字典的键与我想要捕获的特定区域的标签相关，每个键的值就是区域本身然后，我创建了另一个函数，将字典的内容写入文本文件但是，我发现文本文件包含的信息顺序与原始EMBL文件中的不同我不明白它为什么这样做——是因为字典无序吗？有什么办法吗 from Bio import SeqIO s6633 = SeqIO.read("6633_seq.embl", "embl") def make_dict_realgenes(x):

我已经编写了一些代码来解析EMBL文件，并将文件的特定区域转储到字典中

字典的键与我想要捕获的特定区域的标签相关，每个键的值就是区域本身

然后，我创建了另一个函数，将字典的内容写入文本文件

但是，我发现文本文件包含的信息顺序与原始EMBL文件中的不同

我不明白它为什么这样做——是因为字典无序吗？有什么办法吗

from Bio import SeqIO

s6633 = SeqIO.read("6633_seq.embl", "embl")

def make_dict_realgenes(x):
    dict = {}
    for i in range(len(x.features)):
        if x.features[i].type == 'CDS':
            if 'hypothetical' not in x.features[i].qualifiers['product'][0]:
                try:
                    if x.features[i].location.strand == -1:
                        x1 = x.features[i].location.end
                        y1 = x1 + 30
                        dict[str(x.features[i].qualifiers['product'][0])] =\
                             str(x[x1:y1].seq.reverse_complement())
                    else:
                        x2 = x.features[i].location.start
                        y2 = x2 - 30
                        dict[x.features[i].qualifiers['product'][0]] =\
                             str(x[y2:x2].seq)
                except KeyError:
                    if x.features[i].location.strand == -1:
                        x1 = x.features[i].location.end
                        y1 = x1 + 30
                        dict[str(x.features[i].qualifiers['translation'][0])] =\
                             str(x[x1:y1].seq.reverse_complement())
                    else:
                        x2 = x.features[i].location.start
                        y2 = x2 - 30
                        dict[x.features[i].qualifiers['translation'][0]] =\
                             str(x[y2:x2].seq)
    return dict

def rbs_file(dict):
    list = []
    c = 0
    for k, v in dict.iteritems():
        list.append(">" + k + " " + str(c) + "\n" + v + "\n")
        c = c + 1

    f = open("out.txt", "w")
    a = 0
    for i in list:
        f.write(i)
        a = a + 1

    f.close()

要保留词典中的顺序，请使用

collections

中的

orderedict

。尝试将代码顶部更改为：

from collections import OrderedDict
from Bio import SeqIO

s6633 = SeqIO.read("6633_seq.embl", "embl")

def make_dict_realgenes(x):
    dict = OrderedDict()   
...

此外，如果您可以轻松重命名内置的“dict”，我建议不要覆盖它。

我稍微重构了您的代码，我建议在解析文件时按原样编写输出，而不是按顺序转发

from Bio import SeqIO


output = open("out.txt", "w")

for seq in SeqIO.parse("CP001187.embl", "embl"):
    for feature in seq.features:
        if feature.type == "CDS":
            qualifier = (feature.qualifiers.get("product") or
                         feature.qualifiers.get("translation"))[0]
            if "hypothetical" not in qualifier:
                if feature.location.strand == -1: 
                    x1 = feature.location.end
                    x2 = x1 + 30
                    sequence = seq[x1:x2].seq.reverse_complement()
                else:
                    x1 = feature.location.start
                    x2 = x1 - 30
                    sequence = seq[x2:x1].seq

                output.write(">" + qualifier + "\n")
                output.write(str(sequence) + "\n")

                # You can always insert here to the OrderedDict anyway, e.g.
                # d[qualifier] = str(sequence)

output.close()

在python中，对于i-In-range（len（anywhere））是一种很好的选择

使用Biopython还有一种更干净的方法来输出序列。使用列表附加序号，而不是dict或ORDERDEDDICT：

from Bio.SeqRecord import SeqRecord

my_seqs = []

# Each time you generate a sequence, instead of writing to a file
# or inserting in dict, do this:
my_seqs.append(SeqRecord(sequence, id=qualifier, description=""))

 # Now you have the my_seqs, they can be writen in a single line:
SeqIO.write(my_seqs, "output.fas", "fasta")

是的，字典是无序的。如果订单很重要，请使用

列表

或

OrderedDict

。哇，谢谢！但是，

对于范围内的i（len（anything））

有什么问题吗？它不是pythonic。如果要循环元素，请对元素列表中的元素使用

。如果需要索引，请在enumerate（元素列表）中为i，元素使用
。还有其他原因，比如“元素的列表”如果是生成器，则不必有长度。谢谢-这就是我想要的。