Python 从blastx输出文件中提取特定条目,写入新文件
我创建了一个脚本,该脚本成功地在XML格式的Blastx输出文件中搜索关键字(由用户指定)。现在,我需要将包含对齐标题中关键字的记录(查询、命中、得分、评估等)写入新文件 我已经为每个查询标题、点击标题、e值和对齐长度创建了单独的列表,但似乎无法将它们写入新文件Python 从blastx输出文件中提取特定条目,写入新文件,python,list,dictionary,biopython,blast,Python,List,Dictionary,Biopython,Blast,我创建了一个脚本,该脚本成功地在XML格式的Blastx输出文件中搜索关键字(由用户指定)。现在,我需要将包含对齐标题中关键字的记录(查询、命中、得分、评估等)写入新文件 我已经为每个查询标题、点击标题、e值和对齐长度创建了单独的列表,但似乎无法将它们写入新文件 问题#1:如果Python出错,并且其中一个列表缺少一个值,该怎么办。。。?然后,所有其他列表都会给出错误的查询信息(“行滑动”,如果您愿意…) 问题#2:即使Python没有出错,并且所有列表都是相同长度的,我如何将它们写入一个文件
- 问题#1:如果Python出错,并且其中一个列表缺少一个值,该怎么办。。。?然后,所有其他列表都会给出错误的查询信息(“行滑动”,如果您愿意…)
- 问题#2:即使Python没有出错,并且所有列表都是相同长度的,我如何将它们写入一个文件,以便每个列表中的第一项彼此关联(因此,每个列表中的第#10项也关联?),我应该创建一个字典吗
- 问题#3:字典对一个键只有一个值,如果我的查询有多个不同的命中率呢?不确定它是否会被覆盖或跳过,或者它是否只会出错。有什么建议吗?我当前的脚本:
from Bio.Blast import NCBIWWW from Bio.Blast import NCBIXML import re #obtain full path to blast output file (*.xml) outfile = input("Full path to Blast output file (XML format only): ") #obtain string to search for search_string = input("String to search for: ") #open the output file result_handle = open(outfile) #parse the blast record blast_records = NCBIXML.parse(result_handle) #initialize lists query_list=[] hit_list=[] expect_list=[] length_list=[] #create 'for loop' that loops through each HIGH SCORING PAIR in each ALIGNMENT from each RECORD for record in blast_records: for alignment in record.alignments: #for description in record.descriptions??? for hsp in alignment.hsps: #for title in description.title??? #search for designated string search = re.search(search_string, alignment.title) #if search comes up with nothing, end if search is None: print ("Search string not found.") break #if search comes up with something, add it to a list of entries that match search string else: #option to include an 'exception' (if it finds keyword then DOES NOT add that entry to list) if search is "trichomonas" or "entamoeba" or "arabidopsis": print ("found exception.") break else: query_list.append(record.query) hit_list.append(alignment.title) expect_list.append(expect_val) length_list.append(length) #explicitly convert 'variables' ['int' object or 'float'] to strings length = str(alignment.length) expect_val = str(hsp.expect) #print ("\nquery name: " + record.query) #print ("alignment title: " + alignment.title) #print ("alignment length: " + length) #print ("expect value: " + expect_val) #print ("\n***Alignment***\n") #print (hsp.query) #print (hsp.match) #print (hsp.sbjct + "\n\n") if query_len is not hit_len is not expect_len is not length_len: print ("list lengths don't match!") break else: qrylen = len(query_list) query_len = str(qrylen) hitlen = len(hit_list) hit_len = str(hitlen) expectlen = len(expect_list) expect_len = str(expectlen) lengthlen = len(length_list) length_len = str(lengthlen) outpath = str(outfile) #create new file outfile = open("__Blast_Parse_Search.txt", "w") outfile.write("File contains entries from [" + outpath + "] that contain [" + search_string + "]") outfile.close #write list to file i = 0 list_len = int(query_len) for i in range(0, list_len): #append new file outfile = open("__Blast_Parse_Search.txt", "a") outfile.writelines(query_list + hit_list + expect_list + length_list) i = i + 1 #write to disk, close file outfile.flush() outfile.close print ("query list length " + query_len) print ("hit list length " + hit_len) print ("expect list length " + expect_len) print ("length list length " + length_len + "\n\n") print ("first record: " + query_list[0] + " " + hit_list[0] + " " + expect_list[0] + " " + length_list[0]) print ("last record: " + query_list[-1] + " " + hit_list[-1] + " " + expect_list[-1] + " " + length_list[-1]) print ("\nFinished.\n")
try:
x(list)
except exception:
append_default_value(list)
或者对字典键使用元组,如(0,1,1)
,并对默认值使用get方法
如果需要在输出文件中维护数据结构,可以尝试使用搁置:
或者,您可以在每条记录后附加某种类型的引用,并为每条记录提供一个唯一的id,例如“\32{somekey:value}\21\22\44#
”
同样,您可以使用一个元组拥有多个键
我不知道这是否有帮助,您可能会澄清代码中哪些部分有问题。像x()
给我输出y
,但我希望z