如何在python中搜索大文件中的bigram时降低时间复杂度?

如何在python中搜索大文件中的bigram时降低时间复杂度?,python,Python,我正在用python处理一个非常大的文件。我需要检查该文件中是否存在特定的二元内存。我已经写了代码。它提供正确的输出,但速度太慢。还有其他选择吗 def check(word1, word2): with open("D:\bigram.txt", 'r') as file: #bigram_list2=[] for line in file: phrase=word1 + " " + word2 if phrase i

我正在用python处理一个非常大的文件。我需要检查该文件中是否存在特定的二元内存。我已经写了代码。它提供正确的输出,但速度太慢。还有其他选择吗

def check(word1, word2):
    with open("D:\bigram.txt", 'r') as file:
       #bigram_list2=[]
       for line in file:
          phrase=word1 + " " + word2
          if phrase in line:
             return 1
   return -1
将整个文件导入RAM(如果足够) 有关更多信息,请参阅:


希望这有帮助

没有比O(n)更好的时间复杂度了,显而易见的解决方案是O(n)。@aranfey好的,谢谢你的回答。
import mmap

def check(word1, word2):
    with open('D:\bigram.txt', 'rb') as f:
        # Size 0 will read the ENTIRE file into memory!
        m = mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ) #File is open read-only

        # Proceed with your code here -- note the file is already in memory
        # so "readine" here will be as fast as could be
        data = m.readline()
        bigram_list2=[]
        while data:
            data = m.readline()
            phrase=word1 + " " + word2
            if phrase in line:
                return 1
    return -1