Python 克服Ashton字符串任务中的MemoryError/Slow运行时_Python_String_Out Of Memory_N Gram

Python 克服Ashton字符串任务中的MemoryError/Slow运行时

python string

Python 克服Ashton字符串任务中的MemoryError/Slow运行时,python,string,out-of-memory,n-gram,Python,String,Out Of Memory,N Gram,在项目中，目标是：按顺序排列给定字符串的所有不同子字符串按字典顺序排列并连接它们。打印的第k个字符连接的字符串。可以确定，给定的K值将为有效，即将有第k个字符输入格式：第一行将包含一个数字T，即测试用例的数量。弗斯特每个测试用例的行将包含一个包含字符的字符串 (一)−z）第二行将包含一个数字K 输出格式：打印第k个字符（字符串索引为1）而约束是一,≤ T≤ 5 1.≤ 长度≤ 105 K将是一个适当的整数例如，给定输入： 1 dbac 3 输出将是：c 我尝试使用此代码

在项目中，目标是：

按顺序排列给定字符串的所有不同子字符串按字典顺序排列并连接它们。打印的第k个字符连接的字符串。可以确定，给定的K值将为有效，即将有第k个字符

输入格式

：

第一行将包含一个数字T，即测试用例的数量。弗斯特每个测试用例的行将包含一个包含字符的字符串 (一)−z）第二行将包含一个数字K

输出格式

：

打印第k个字符（字符串索引为1）

而

约束

是

一,≤ T≤ 5
1.≤ 长度≤ 105
K将是一个适当的整数

例如，给定输入：

1
dbac
3

输出将是：

我尝试使用此代码执行此任务，它适用于相对较短的字符串：

from itertools import chain

def zipngram(text, n=2):
    words = list(text)
    return zip(*[words[i:] for i in range(n)])

for _ in input():
    t = input()
    position = int(input())-1 # 0th indexing
    chargrams = chain(*[zipngram(t,i) for i in range(1,len(t)+1)])
    concatstr = ''.join(sorted([''.join(s) for s in chargrams]))
    print (concatstr[position])

但是如果输入文件如下所示，并且所需的输出是：

l
s
y
h
s

解释器将抛出一个

MemoryError

：

Traceback (most recent call last):
  File "solution.py", line 11, in <module>
    chargrams = chain(*[zipngram(t,i) for i in range(1,len(t)+1)])
  File "solution.py", line 11, in <listcomp>
    chargrams = chain(*[zipngram(t,i) for i in range(1,len(t)+1)])
  File "solution.py", line 6, in zipngram
    return zip(*[words[i:] for i in range(n)])
  File "solution.py", line 6, in <listcomp>
    return zip(*[words[i:] for i in range(n)])
MemoryError

有没有一种方法可以在不使用太多内存（导致
内存错误
和运行时太慢（使用
heapq
推送和弹出）之间取得平衡？

内存错误

意味着程序消耗了所有可用内存，因此崩溃

一个可能的解决方案是使用惰性的iterables（它们也在Py2中工作，但Py3对它们有更好的支持）（它们只根据需要计算值，而不是一次计算所有值）

使您的程序适应生成器只需稍作更改，即可在不使用列表的情况下为生成器编制索引（这将使懒惰的好处无效）。请参阅：

尝试此代码，它适用于大样本

def ashton(string, k):
    #We need all the substrings, and they have to be sorted
    sortedSubstrings = sorted_substrings(string)
    count = 0
    currentSubstring = 0
    #Loop through the substrings, until we reach the kth character
    while (count < k):
        substringLen = len(sortedSubstrings[currentSubstring])
        #add the number of characters of the substring to our counter
        count += substringLen
        #advance the current substring by one
        currentSubstring += 1
    #We have the correct substring now, and calculate to get the right char
    diff = count - k
    #Return answer, index 1 = substring, index 2 = char in substring
    return(sortedSubstrings[currentSubstring][substringLen-diff-1])

#Determine the substrings in correct order
#Input: 'dbac', returns: a, ac, b, ba, bac, c, d, db, dba, dbac
def sorted_substrings(string):
    a = set()
    length = len(string)
    #loop through the string to get the substrings
    for i in range(length):
        for j in range(i + 1, length + 1):
            #add each substring to our set
            a.add(string[i:j]) 
    #we need the set to be sorted
    a = sorted(a)
    return a

t = int(input())
for i in range(t):
    s = input()
    k = int(input())
    print(ashton(s, k))

def ashton（字符串，k）： #我们需要所有的子字符串，并且必须对它们进行排序 sortedSubstrings=已排序的子字符串（字符串）计数=0 currentSubstring=0 #循环遍历子字符串，直到到达第k个字符而（计数您能试试这个输入吗：？它也会进入MemoryError吗？@alvas请尝试我的代码，它不会出现内存错误并返回正确的结果耐心，你必须有。来吧，他们会的，投票临近时，悬赏是。另外，对

sortedSubstrings=sorted（set（[string[x:y]表示范围内的x（length）表示范围内的y（length），如果string[x:y]]）

展开循环可以轻松获得投票权=@alvas，我现在把那条复杂的线改写成它自己的函数，这使它更容易阅读。sorted_substring函数将所有子字符串按字典顺序排列。所以“dbac”的函数返回一个集合：a，ac，b。。。一旦我们有了已排序的子字符串，while循环就会检查k，并随着我们查看每个子字符串而递增。因此，在k=3的简单测试用例中，我们首先看“a”，它将count增加1。然后是“ac”，它将计数增加到3。现在计数等于k，我们从循环中中断。

def ashton(string, k):
    #We need all the substrings, and they have to be sorted
    sortedSubstrings = sorted_substrings(string)
    count = 0
    currentSubstring = 0
    #Loop through the substrings, until we reach the kth character
    while (count < k):
        substringLen = len(sortedSubstrings[currentSubstring])
        #add the number of characters of the substring to our counter
        count += substringLen
        #advance the current substring by one
        currentSubstring += 1
    #We have the correct substring now, and calculate to get the right char
    diff = count - k
    #Return answer, index 1 = substring, index 2 = char in substring
    return(sortedSubstrings[currentSubstring][substringLen-diff-1])

#Determine the substrings in correct order
#Input: 'dbac', returns: a, ac, b, ba, bac, c, d, db, dba, dbac
def sorted_substrings(string):
    a = set()
    length = len(string)
    #loop through the string to get the substrings
    for i in range(length):
        for j in range(i + 1, length + 1):
            #add each substring to our set
            a.add(string[i:j]) 
    #we need the set to be sorted
    a = sorted(a)
    return a

t = int(input())
for i in range(t):
    s = input()
    k = int(input())
    print(ashton(s, k))