在python中将文件中的一个大列表打印到多个子列表中，并使用重叠序列_Python_List_Overlap_Sequences

在python中将文件中的一个大列表打印到多个子列表中，并使用重叠序列

python list

在python中将文件中的一个大列表打印到多个子列表中，并使用重叠序列,python,list,overlap,sequences,Python,List,Overlap,Sequences,目前，我在一个文件中有一个很长的序列，我希望将这个序列分割成更小的子序列，但我希望每个子序列都与前一个序列重叠，并将它们放入一个列表中。以下是我的意思的一个例子：（对这个神秘的序列表示歉意，这都在一行上）目前，我可以使用以下代码将每个序列拆分为更小的酱汁序列，而不存在重叠： def chunks(seq, n): division = len(seq) / float (n) return [ seq[int(round(division * i)): int(rou

目前，我在一个文件中有一个很长的序列，我希望将这个序列分割成更小的子序列，但我希望每个子序列都与前一个序列重叠，并将它们放入一个列表中。以下是我的意思的一个例子：

（对这个神秘的序列表示歉意，这都在一行上）

目前，我可以使用以下代码将每个序列拆分为更小的酱汁序列，而不存在重叠：

def chunks(seq, n):
    division = len(seq) / float (n)
        return [ seq[int(round(division * i)): int(round(division * (i + 1)))] for i in xrange(n) ]

在上面的代码中，n指定列表将分成多少子序列

我在考虑抓取每个子序列的末端，然后通过硬编码将它们连接到列表中元素的末端。。。但这将是低效和艰难的。有没有一个简单的方法可以做到这一点

实际上，我需要重叠大约100个字符

谢谢各位

seq="abcdefessdfekgheithrfkopeifhghtryrhfbcvdfersdwtiyuyrterdhcbgjherytyekdnfiwytowihfiwoeirehjiwoqpft"
>>> n = 4
>>> overlap = 5
>>> division = len(seq)/n
>>> [seq[i*division:(i+1)*division+overlap] for i in range(n)]
['abcdefessdfekgheithrfkopeifhg', 'eifhghtryrhfbcvdfersdwtiyuyrt', 'yuyrterdhcbgjherytyekdnfiwyto', 'iwytowihfiwoeirehjiwoqpft']

这样做可能会稍微有效一些

>>> [seq[i:i+division+overlap] for i in range(0,n*division,division)]
['abcdefessdfekgheithrfkopeifhg', 'eifhghtryrhfbcvdfersdwtiyuyrt', 'yuyrterdhcbgjherytyekdnfiwyto', 'iwytowihfiwoeirehjiwoqpft']

如果要将序列

seq

拆分为长度

length

的子序列，且每个子序列及其后续序列之间共享的字符数/元素数

overlap

：

def split_with_overlap(seq, length, overlap):
    return [seq[i:i+length] for i in range(0, len(seq), length - overlap)]

然后在原始数据上进行测试：

>>> seq = 'abcdefessdfekgheithrfkopeifhghtryrhfbcvdfersdwtiyuyrterdhcbgjherytyekdnfiwytowihfiwoeirehjiwoqpft'

>>> split_with_overlap(seq, 31, 5)
['abcdefessdfekgheithrfkopeifhght', 'fhghtryrhfbcvdfersdwtiyuyrterdh', 'terdhcbgjherytyekdnfiwytowihfiw', 'ihfiwoeirehjiwoqpft']

非常感谢你的回答！这对我来说很有意义。

>>> seq = 'abcdefessdfekgheithrfkopeifhghtryrhfbcvdfersdwtiyuyrterdhcbgjherytyekdnfiwytowihfiwoeirehjiwoqpft'

>>> split_with_overlap(seq, 31, 5)
['abcdefessdfekgheithrfkopeifhght', 'fhghtryrhfbcvdfersdwtiyuyrterdh', 'terdhcbgjherytyekdnfiwytowihfiw', 'ihfiwoeirehjiwoqpft']