UCSC BLAT输出python
是否有一种方法可以使用Python从以下BLAT结果中获得不匹配的位置号UCSC BLAT输出python,python,blat,Python,Blat,是否有一种方法可以使用Python从以下BLAT结果中获得不匹配的位置号 00000001 taaaagatgaagtttctatcatccaaaaaatgggctacagaaacc 00000045 <<<<<<<< ||||||||||||||||||||||||||| |||||||||||||||| <<<<<<<< 41629392 taaaagatgaagtttctatcatccaaag
00000001 taaaagatgaagtttctatcatccaaaaaatgggctacagaaacc 00000045
<<<<<<<< ||||||||||||||||||||||||||| |||||||||||||||| <<<<<<<<
41629392 taaaagatgaagtttctatcatccaaagtatgggctacagaaacc 41629348
00000001 TAAAAGATGAAGTTTCATCATCCAAAAAAATGGGCTACAGAAACC 00000045
您可以使用字符串的.find
方法查找不匹配项。不匹配由空间(′)表示,因此我们在BLAT输出的中线寻找。我个人不认识布拉特,所以我不确定输出是否总是以三行的形式出现,但假设是这样,下面的函数将返回一个位置不匹配列表,每个位置在顶部序列中表示为不匹配位置的元组,在底部序列中表示为相同的位置
blat_src = """00000001 taaaagatgaagtttctatcatccaaaaaatgggctacagaaacc 00000045
<<<<<<<< ||||||||||||||||||||||||||| |||||||||||||||| <<<<<<<<
41629392 taaaagatgaagtttctatcatccaaagtatgggctacagaaacc 41629348"""
def find_mismatch(blat):
#break the blat input into lines
lines = blat.split("\n")
#give some firendly names to the different lines
seq_a = lines[0]
seq_b = lines[2]
#We're not interested in the '<' and '>' so we strip them out with a slice
matchstr = lines[1][9:-9]
#Get the integer values of the starts of each sequence segment
pos_a = int(seq_a[:8])
pos_b = int(seq_b[:8])
results = []
#find the index of first space character, mmpos = mismatch position
mmpos = matchstr.find(" ")
#if a space exists (-1 if none found)
while mmpos != -1:
#the position of the mismatch is the start position of the
#sequence plus the index within the segment
results.append((posa+mmpos, posb+mmpos))
#search the rest of the string (from mmpos+1 onwards)
mmpos = matchstr.find(" ", mmpos+1)
return results
print find_mismatch(blat_src)
告诉我们位置28和29(根据顶部序列索引)或位置41629419和41629420(根据底部序列索引)不匹配 您想在顶部或底部序列中显示该位置吗?两者都有?我要从最下面的顺序。
[(28, 41629419), (29, 41629420)]