在Python中优化查找和匹配代码_Python_Optimization_Findall

在Python中优化查找和匹配代码

python optimization

在Python中优化查找和匹配代码,python,optimization,findall,Python,Optimization,Findall,我有一个输入两个文件的代码：（1）词典（2）文本文件（每行一句话）代码的第一部分以元组形式读取字典，因此输出如下内容： ('mthy3lkw', 'weakBelief', 'U') ('mthy3lkm', 'firmBelief', 'B') ('mthy3lh', 'notBelief', 'A') 代码的第二部分是在文本文件中的每个句子中搜索这些元组中位置0处的单词，然后打印出句子、搜索词及其类型因此，给定句子mthy3lkw ana mesh 3arif，期望输出为：

我有一个输入两个文件的代码：（1）词典（2）文本文件（每行一句话）

代码的第一部分以元组形式读取字典，因此输出如下内容：

('mthy3lkw', 'weakBelief', 'U')

('mthy3lkm', 'firmBelief', 'B')

('mthy3lh', 'notBelief', 'A')

代码的第二部分是在文本文件中的每个句子中搜索这些元组中位置0处的单词，然后打印出句子、搜索词及其类型

因此，给定句子mthy3lkw ana mesh 3arif，期望输出为：

[“mthy3lkw ana mesh 3arif”、“mthy3lkw”、“weakBelief”、“U”]假设突出显示的单词在词典中找到

我代码的第二部分——匹配部分——太慢了。我如何使它更快

这是我的密码

findings = [] 
for sentence in data:  # I open the sentences file with .readlines()
    for word in tuples:  # similar to the ones mentioned above
        p1 = re.compile('\\b%s\\b'%word[0])  # get the first word in every tuple
        if p1.findall(sentence) and word[1] == "firmBelief":
            findings.append([sentence, word[0], "firmBelief"])

print findings

将元组列表转换为，并使用它进行搜索。

构建dict查找结构，以便快速从元组中找到正确的元组。然后你可以重组你的循环，这样你就不用为每一个句子通读整本字典，试着匹配每一个条目，而是把句子中的每一个单词都通读一遍，然后在字典中查找：

# Create a lookup structure for words
word_dictionary = dict((entry[0], entry) for entry in tuples)

findings = []
word_re = re.compile(r'\b\S+\b') # only need to create the regexp once
for sentence in data:
    for word in word_re.findall(sentence): # Check every word in the sentence
        if word in word_dictionary: # A match was found
            entry = word_dictionary[word]
            findings.append([sentence, word, entry[1], entry[2]])

你能进一步扩展你的答案来帮助OP理解trie，以及它将如何帮助加速搜索吗？我只是从Python开始，所以我不知道trie是什么。Sabba:a不是Python的东西，它是数据结构的东西。它允许快速搜索，因为类似的单词在树状结构中被隔离。