Python 从blastx输出文件中提取特定条目，写入新文件_Python_List_Dictionary_Biopython_Blast

Python 从blastx输出文件中提取特定条目，写入新文件

python list dictionary

Python 从blastx输出文件中提取特定条目，写入新文件,python,list,dictionary,biopython,blast,Python,List,Dictionary,Biopython,Blast,我创建了一个脚本，该脚本成功地在XML格式的Blastx输出文件中搜索关键字（由用户指定）。现在，我需要将包含对齐标题中关键字的记录（查询、命中、得分、评估等）写入新文件我已经为每个查询标题、点击标题、e值和对齐长度创建了单独的列表，但似乎无法将它们写入新文件问题#1：如果Python出错，并且其中一个列表缺少一个值，该怎么办。。。？然后，所有其他列表都会给出错误的查询信息（“行滑动”，如果您愿意…）问题#2：即使Python没有出错，并且所有列表都是相同长度的，我如何将它们写入一个文件

我创建了一个脚本，该脚本成功地在XML格式的Blastx输出文件中搜索关键字（由用户指定）。现在，我需要将包含对齐标题中关键字的记录（查询、命中、得分、评估等）写入新文件

我已经为每个查询标题、点击标题、e值和对齐长度创建了单独的列表，但似乎无法将它们写入新文件

问题#1：如果Python出错，并且其中一个列表缺少一个值，该怎么办。。。？然后，所有其他列表都会给出错误的查询信息（“行滑动”，如果您愿意…）
问题#2：即使Python没有出错，并且所有列表都是相同长度的，我如何将它们写入一个文件，以便每个列表中的第一项彼此关联（因此，每个列表中的第#10项也关联？），我应该创建一个字典吗

问题#3：字典对一个键只有一个值，如果我的查询有多个不同的命中率呢？不确定它是否会被覆盖或跳过，或者它是否只会出错。有什么建议吗？我当前的脚本：

from Bio.Blast import NCBIWWW
from Bio.Blast import NCBIXML
import re

#obtain full path to blast output file (*.xml)
outfile = input("Full path to Blast output file (XML format only): ")

#obtain string to search for
search_string = input("String to search for: ")

#open the output file
result_handle = open(outfile)

#parse the blast record
blast_records = NCBIXML.parse(result_handle)

#initialize lists
query_list=[]
hit_list=[]
expect_list=[]
length_list=[]

#create 'for loop' that loops through each HIGH SCORING PAIR in each ALIGNMENT from each RECORD
for record in blast_records:
        for alignment in record.alignments:     #for description in record.descriptions???
                for hsp in alignment.hsps:      #for title in description.title???

                        #search for designated string
                        search = re.search(search_string, alignment.title)

                        #if search comes up with nothing, end
                        if search is None:
                                print ("Search string not found.")
                                break

                        #if search comes up with something, add it to a list of entries that match search string
                        else:

                                #option to include an 'exception' (if it finds keyword then DOES NOT add that entry to list)
                                if search is "trichomonas" or "entamoeba" or "arabidopsis":
                                        print ("found exception.")
                                        break
                                else:

                                        query_list.append(record.query)
                                        hit_list.append(alignment.title)
                                        expect_list.append(expect_val)
                                        length_list.append(length)

                                        #explicitly convert 'variables' ['int' object or 'float'] to strings
                                        length = str(alignment.length)
                                        expect_val = str(hsp.expect)

                                        #print ("\nquery name: " + record.query)
                                        #print ("alignment title: " + alignment.title)
                                        #print ("alignment length: " + length)
                                        #print ("expect value: " + expect_val)
                                        #print ("\n***Alignment***\n")
                                        #print (hsp.query)
                                        #print (hsp.match)
                                        #print (hsp.sbjct + "\n\n")


                                        if query_len is not hit_len is not expect_len is not length_len:
                                                print ("list lengths don't match!")
                                                break
                                        else:

                                                qrylen = len(query_list)
                                                query_len = str(qrylen)
                                                hitlen = len(hit_list)
                                                hit_len = str(hitlen)
                                                expectlen = len(expect_list)
                                                expect_len = str(expectlen)
                                                lengthlen = len(length_list)
                                                length_len = str(lengthlen)
                                                outpath = str(outfile)

                                                #create new file
                                                outfile = open("__Blast_Parse_Search.txt", "w")
                                                outfile.write("File contains entries from [" + outpath + "] that contain [" + search_string + "]")
                                                outfile.close

                                                #write list to file
                                                i = 0
                                                list_len = int(query_len)
                                                for i in range(0, list_len):

                                                        #append new file
                                                        outfile = open("__Blast_Parse_Search.txt", "a")
                                                        outfile.writelines(query_list + hit_list + expect_list + length_list)
                                                        i = i + 1

                                                #write to disk, close file
                                                outfile.flush()
                                                outfile.close

print ("query list length " + query_len)
print ("hit list length " + hit_len)
print ("expect list length " + expect_len)
print ("length list length " + length_len + "\n\n")
print ("first record: " + query_list[0] + " " + hit_list[0] + " " + expect_list[0] + " " + length_list[0])
print ("last record: " + query_list[-1] + " " + hit_list[-1] + " " + expect_list[-1] + " " + length_list[-1])
print ("\nFinished.\n")

如果我正确理解了您的问题，您可以使用线滑动的默认值，例如：

try:
  x(list)
except exception:
  append_default_value(list)

或者对字典键使用元组，如

（0,1,1）

，并对默认值使用get方法

如果需要在输出文件中维护数据结构，可以尝试使用搁置：

或者，您可以在每条记录后附加某种类型的引用，并为每条记录提供一个唯一的id，例如“

\32{somekey:value}\21\22\44#

”

同样，您可以使用一个元组拥有多个键

我不知道这是否有帮助，您可能会澄清代码中哪些部分有问题。像

x（）

给我输出

，但我希望