如何优化此Python代码（来自ThinkPython，练习10.10）_Python

如何优化此Python代码（来自ThinkPython，练习10.10）

python

如何优化此Python代码（来自ThinkPython，练习10.10）,python,Python,我正在阅读艾伦·唐尼的《如何像计算机科学家一样思考》一书，我写了一篇我认为功能正确的练习10.10的解答。但它只花了10个多小时（！）就完成了，所以我想知道我是否错过了一些非常明显和有用的优化以下是练习： “如果从每个单词中交替提取字母，则两个单词‘互锁’构成一个新词。例如，‘shoe’和‘cold’互锁形成‘schooled’。编写一个程序，查找所有互锁的单词对。提示：不要枚举所有对！” （对于这些单词列表问题，唐尼提供了一个包含113809个单词的文件。我们可以假设这些单词在一个列表中，列

我正在阅读艾伦·唐尼的《如何像计算机科学家一样思考》一书，我写了一篇我认为功能正确的练习10.10的解答。但它只花了10个多小时（！）就完成了，所以我想知道我是否错过了一些非常明显和有用的优化

以下是练习：

“如果从每个单词中交替提取字母，则两个单词‘互锁’构成一个新词。例如，‘shoe’和‘cold’互锁形成‘schooled’。编写一个程序，查找所有互锁的单词对。提示：不要枚举所有对！”

（对于这些单词列表问题，唐尼提供了一个包含113809个单词的文件。我们可以假设这些单词在一个列表中，列表中每个项目一个单词。）

以下是我的解决方案：

from bisect import bisect_left

def index(lst, target):
    """If target is in list, returns the index of target; otherwise returns None"""
    i = bisect_left(lst, target)
    if i != len(lst) and lst[i] == target:
        return i
    else:
        return None

def interlock(str1, str2):
    "Takes two strings of equal length and 'interlocks' them."
    if len(str1) == len(str2):
        lst1 = list(str1)
        lst2 = list(str2)
        result = []
        for i in range(len(lst1)):
            result.append(lst1[i])
            result.append(lst2[i])
        return ''.join(result)
    else:
        return None

def interlockings(word_lst):
    """Checks each pair of equal-length words to see if their interlocking is a word; prints each successful pair and the total number of successful pairs."""
    total = 0
    for i in range(1, 12):  # 12 because max word length is 22
        # to shorten the loops, get a sublist of words of equal length
        sub_lst = filter(lambda(x): len(x) == i, word_lst)
        for word1 in sub_lst[:-1]:
            for word2 in sub_lst[sub_lst.index(word1)+1:]: # pair word1 only with words that come after word1
                word1word2 = interlock(word1, word2) # interlock word1 with word2
                word2word1 = interlock(word2, word1) # interlock word2 with word1
                if index(lst, word1word2): # check to see if word1word2 is actually a word
                    total += 1
                    print "Word 1: %s, Word 2: %s, Interlock: %s" % (word1, word2, word1word2)
                if index(lst, word2word1): # check to see if word2word1 is actually a word
                    total += 1
                    print "Word 2, %s, Word 1: %s, Interlock: %s" % (word2, word1, word2word1)
    print "Total interlockings: ", total

打印报表不是问题所在；我的程序只找到了652对这样的组合。问题是嵌套循环，对吗？我的意思是，即使我在只包含相同长度的单词的列表上循环，也有（例如）21727个长度为7的单词，这意味着我的程序必须检查超过4亿个“连锁”，看看它们是否是实际的单词——这只是长度为7的单词

同样，这段代码运行了10个小时（如果您好奇的话，没有发现长度为5或更大的单词对）。有没有更好的办法来解决这个问题

提前感谢您提供的所有见解。我知道“过早优化是万恶之源”——也许我已经掉进了那个陷阱——但一般来说，虽然我通常可以编写正确运行的代码，我经常难以编写运行良好的代码。

反过来做：迭代所有单词，并通过取奇数和偶数字母将它们分成两个单词。然后在字典里查这两个词

作为侧节点，互锁的两个单词不一定具有相同的长度——长度也可能相差1

一些（未测试）代码：

联锁的替代定义：

import itertools

def interlock(str1, str2):
    "Takes two strings of equal length and 'interlocks' them."
    return ''.join(itertools.chain(*zip(str1, str2)))

一件重要的事情是你的

索引

函数：它比任何函数运行得都多。当您不需要找到的单词的索引时，为什么要定义一个函数来查找该索引

如果lst中的word1word2:

足够了，而不是

如果索引（lst，word1word2）：

如果索引（lst，word2word1）为

则相同：

嗯。在语法中，二分法的工作速度确实比

快。为了进一步提高速度，我建议在联锁
函数中直接使用对分左
函数
例如，而不是：
        if index(lst, word1word2): # check to see if word1word2 is actually a word
            total += 1
            print "Word 1: %s, Word 2: %s, Interlock: %s" % (word1, word2, word1word2)

使用：
在速度上有一点小小的提高。
另一种版本：
with open('words.txt') as inf:
    words = set(wd.strip() for wd in inf)

word_gen = ((word, word[::2], word[1::2]) for word in words)
interlocked = [word for word,a,b in word_gen if a in words and b in words]

在我的机器上，它运行时间为0.16秒，返回1254个单词

编辑：正如@John Machin在这里指出的，可以通过延迟执行进一步改进（只有在第一个片段生成有效单词时才执行第二个片段）：
这将使执行时间减少三分之一，降至0.104秒。
谢谢！我今天将尝试实现它，看看是否有帮助。关于你的侧记：我想到了这一点，并认为第一次传球太复杂了，如果我把全长相等的情况放在第一位，我会回去考虑一个不同的情况。一旦我成功地实施了你的建议，我会把这个想法融入其中。天哪！！经过的时间从10小时到15.6秒。这包括新实现中的difference-by-1案例（实现起来很简单）。哇！非常感谢！我认为这是正确的方法，但结果不正确。也就是说，你把一个单词分成了偶数和奇数两部分。所述的问题是将两个单词组合成一个新词。@drewk：所述的问题是：“编写一个程序，查找所有互锁的单词对。”上面的代码正是这样做的。@drewk：如果所有单词的列表中只包含“cold”和“shoe”，那么“schooled”在定义上不是一个单词，“cold”和“shoe”不要互锁。对于这种情况，上面的代码没有正确地打印任何内容。唐尼在书的前面建议，“列表中的项”语法比我的“索引”函数使用的二等分算法运行得慢。我承认我还没有测试过他的断言，看看我的索引函数是否真的比内置的“in”语法运行得快，所以也许我以后会测试一下。@Alex:你的index（）
函数比Hossein在lst中的建议word
要快，这是对的。比这两种方法都快的是使用s=set（words）
并在s

中测试

word。下面的答案很有意义，但我很感兴趣的是，如果您尝试分析此代码，以确定到底是什么导致它变慢？其他有助于加速的方法是通过Psyco或PyPy运行它。@Glenjamin:我没有分析代码，因为我不知道如何进行分析。你能提供一些文档的链接来解释如何做到这一点吗？谢谢比我能解释的好得多：）简短版本：python-mcprofile myscript.py
这不起作用。如果words
是set（['cold'，'shoe']）
--操作示例，word\u gen
是[（'cold'，'cl'，'od'），（'shoe'，'so'，'he'）]@drewk：如果words
是set（['cold'，'shoe']）
，那么就不会有一对互锁的单词，上面的代码也找不到。如果words
设置为set（['cold'，'shoe'，'schooled']），则会有一对互锁单词，上面的代码会找到它。
        q = bisect_left(lst, word1word2)
        if q != len(lst) and lst[q] == word1word2:
            total += 1
            print "Word 1: %s, Word 2: %s, Interlock: %s" % (word1, word2, word1word2)

with open('words.txt') as inf:
    words = set(wd.strip() for wd in inf)

word_gen = ((word, word[::2], word[1::2]) for word in words)
interlocked = [word for word,a,b in word_gen if a in words and b in words]

with open('words.txt') as inf:
    words = set(wd.strip() for wd in inf)
interlocked = [word for word in words if word[::2] in words and word[1::2] in words]