Python 计数最大k-mer重复频率
如果n为3,则序列atag包含4x ATA、3x TAT和1x标记。因此,该比例为4/8=0.5。这个数字越高,序列的重复性就越高Python 计数最大k-mer重复频率,python,string,sequence,Python,String,Sequence,如果n为3,则序列atag包含4x ATA、3x TAT和1x标记。因此,该比例为4/8=0.5。这个数字越高,序列的重复性就越高 编写函数Simple,n其中S是序列,n是K-Mo的长度考虑。该函数应返回上述比例 有人能帮我吗?这看起来像是家庭作业,但至少是脑筋急转弯 提示:itertools、生成器和集合对于解决此类问题非常方便 import itertools import collections ACIDS = ('A', 'C', 'T', 'G') def walk_seq(s
编写函数Simple,n其中S是序列,n是K-Mo的长度考虑。该函数应返回上述比例
有人能帮我吗?这看起来像是家庭作业,但至少是脑筋急转弯 提示:itertools、生成器和集合对于解决此类问题非常方便
import itertools
import collections
ACIDS = ('A', 'C', 'T', 'G')
def walk_seq(s, chunk_size):
assert len(s) >= chunk_size
for i in range(0, len(s) - chunk_size + 1):
yield s[i:i+chunk_size]
def simple(s, n):
snip_counts = collections.defaultdict(int)
for chunk in walk_seq(s, n):
for snip_tuple in itertools.product(ACIDS, repeat=n):
snip = ''.join(snip_tuple)
if chunk == snip:
snip_counts[snip] += 1
total_matches = sum(snip_counts.values())
maxi = max(snip_counts.values())
return float(maxi) / total_matches
print simple('ATATATATAG', 3)
这是一个非常好的算法问题,你自己也可以尝试一下,但这里的解决方案没有什么挑战
s = "ATATATATAG"
n = 3
def simple(s,n):
dictionary = {}
total = 0
for i in range (len(s)-(n-1)): # (n-1) to get last element
k = i+n
if s[i:k] in dictionary:
dictionary[s[i:k]] += 1
else:
dictionary.update({s[i:k]:1})
total += 1 # doing it here to avoid sum(dictionary.values())
for key, value in dictionary.items():
dictionary[key] = value/total
# As a challenge, edit the line above to lambda function
print(dictionary)
simple(s,n)
# sample output
#{'TAT': 0.375, 'ATA': 0.5, 'TAG': 0.125}
您可能希望使用for循环从输入字符串的开始到结束。为输入序列中遇到的每一个长度为n=3的序列保留一个python字典。你能公布你已经做了什么,甚至你自己对如何解决问题的想法吗?这不是家庭作业服务。漂亮的高尔夫:为什么要导入第三方库?seq[i:i+n]范围内的i seq-n+1将在ngrams上产生相同的发电机。
s = "ATATATATAG"
n = 3
def simple(s,n):
dictionary = {}
total = 0
for i in range (len(s)-(n-1)): # (n-1) to get last element
k = i+n
if s[i:k] in dictionary:
dictionary[s[i:k]] += 1
else:
dictionary.update({s[i:k]:1})
total += 1 # doing it here to avoid sum(dictionary.values())
for key, value in dictionary.items():
dictionary[key] = value/total
# As a challenge, edit the line above to lambda function
print(dictionary)
simple(s,n)
# sample output
#{'TAT': 0.375, 'ATA': 0.5, 'TAG': 0.125}