如何在python中搜索大文件中的bigram时降低时间复杂度?
我正在用python处理一个非常大的文件。我需要检查该文件中是否存在特定的二元内存。我已经写了代码。它提供正确的输出,但速度太慢。还有其他选择吗如何在python中搜索大文件中的bigram时降低时间复杂度?,python,Python,我正在用python处理一个非常大的文件。我需要检查该文件中是否存在特定的二元内存。我已经写了代码。它提供正确的输出,但速度太慢。还有其他选择吗 def check(word1, word2): with open("D:\bigram.txt", 'r') as file: #bigram_list2=[] for line in file: phrase=word1 + " " + word2 if phrase i
def check(word1, word2):
with open("D:\bigram.txt", 'r') as file:
#bigram_list2=[]
for line in file:
phrase=word1 + " " + word2
if phrase in line:
return 1
return -1
将整个文件导入RAM(如果足够)
有关更多信息,请参阅:
希望这有帮助 没有比O(n)更好的时间复杂度了,显而易见的解决方案是O(n)。@aranfey好的,谢谢你的回答。
import mmap
def check(word1, word2):
with open('D:\bigram.txt', 'rb') as f:
# Size 0 will read the ENTIRE file into memory!
m = mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ) #File is open read-only
# Proceed with your code here -- note the file is already in memory
# so "readine" here will be as fast as could be
data = m.readline()
bigram_list2=[]
while data:
data = m.readline()
phrase=word1 + " " + word2
if phrase in line:
return 1
return -1