Python 从一个给定单词的字母中，你能造出多少个4个字母或更多的普通英语单词（每个字母只能使用一次）_Python_Algorithm_Permutation_Puzzle

Python 从一个给定单词的字母中，你能造出多少个4个字母或更多的普通英语单词（每个字母只能使用一次）

python algorithm

Python 从一个给定单词的字母中，你能造出多少个4个字母或更多的普通英语单词（每个字母只能使用一次）,python,algorithm,permutation,puzzle,Python,Algorithm,Permutation,Puzzle,在积木日历的背面，我发现了以下谜语：你能从这些字母中写出多少4个或更多的普通英语单词 “教科书”一词（每个字母只能使用一次）我提出的第一个解决方案是： from itertools import permutations with open('/usr/share/dict/words') as f: words = f.readlines() words = map(lambda x: x.strip(), words) given_word = 'textbook' fo

在积木日历的背面，我发现了以下谜语：

你能从这些字母中写出多少4个或更多的普通英语单词 “教科书”一词（每个字母只能使用一次）

我提出的第一个解决方案是：

from itertools import permutations

with open('/usr/share/dict/words') as f:
    words = f.readlines()

words = map(lambda x: x.strip(), words)

given_word = 'textbook'

found_words = []

ps = (permutations(given_word, i) for i in range(4, len(given_word)+1))

for p in ps:
    for word in map(''.join, p):
        if word in words and word != given_word:
            found_words.append(word)
print set(found_words)

这给出了结果

集（['tote'，'oboe'，'text'，'boot'，'take'，'toot'，'book'，'toke'，'betook']）

，但在我的机器上花费了7分钟以上

我的下一次迭代是：

with open('/usr/share/dict/words') as f:
    words = f.readlines()

words = map(lambda x: x.strip(), words)

given_word = 'textbook'

print [word for word in words if len(word) >= 4 and sorted(filter(lambda letter: letter in word, given_word)) == sorted(word) and word != given_word]

它们几乎立即返回答案，但作为答案给出：

['book'、'oboe'、'text'、'toot']

对于这个问题，什么是最快、最正确、最具吸引力的解决方案

（edit：添加了我以前的置换解决方案及其不同的输出）。

有一个生成器

itertools.permutations

，您可以使用它收集具有指定长度的序列的所有置换。这样做更容易：

from itertools import permutations

GIVEN_WORD = 'textbook'

with open('/usr/share/dict/words', 'r') as f:
    words = [s.strip() for s in f.readlines()]

print len(filter(lambda x: ''.join(x) in words, permutations(GIVEN_WORD, 4)))

编辑#1:噢!！上面写着“4个或更多”；）忘了我说的吧

编辑#2：这是我提出的第二个版本：

LETTERS = set('textbook')

with open('/usr/share/dict/words') as f:
    WORDS = filter(lambda x: len(x) >= 4, [l.strip() for l in f])

matching = filter(lambda x: set(x).issubset(LETTERS) and all([x.count(c) == 1 for c in x]), WORDS)
print len(matching)

这个怎么样

from itertools import permutations, chain

with open('/usr/share/dict/words') as fp:
    words = set(fp.read().split())

given_word = 'textbook'

perms = (permutations(given_word, i) for i in range(4, len(given_word)+1))
pwords = (''.join(p) for p in chain(*perms))
matches = words.intersection(pwords)

print matches

给

>>> print matches
set(['textbook', 'keto', 'obex', 'tote', 'oboe', 'text', 'boot', 'toto', 'took', 'koto', 'bott', 'tobe', 'boke', 'toot', 'book', 'bote', 'otto', 'toke', 'toko', 'oket'])

创建整个电源集，然后检查字典中的单词是否在该集中（字母顺序无关紧要）：

我想我会分享这个稍微有趣的技巧，尽管它比其他的需要更多的代码，并且不是真正的“pythonic”。与其他解决方案相比，这将需要更多的代码，但如果我考虑其他解决方案所需的时间，这应该是相当快的

我们正在做一些预处理来加速计算。基本方法如下：我们给字母表中的每个字母分配一个素数。例如，A=2，B=3，等等。然后我们为字母表中的每个单词计算一个散列，它只是单词中每个字符的素表示的乘积。然后，我们将每个单词存储在由哈希索引的词典中

现在，如果我们想找出哪些单词等同于say

教科书

，我们只需计算单词的相同哈希值并在字典中查找即可。通常（比如在C++中）我们不得不担心溢出，但在python中甚至比这更简单：列表中具有相同索引的每个单词将包含完全相同的字符

这是一段稍微优化的代码，在我们的例子中，我们只需要担心给定单词中出现的字符，这意味着我们可以使用比其他情况小得多的素数表（最明显的优化是只为单词中出现的字符分配一个值-无论如何，它足够快，所以我不需要麻烦，这样我们可以只预处理一次，然后对几个单词进行预处理）。基本算法非常有用，因此您自己也应该有一个算法；）

运行在我的Ubuntu Word表（98K单词）上，但不是我所说的Pythic，因为它基本上是C++算法的一个端口。如果你想用这种方式比较多个单词，那么这很有用。

下面只检查字典中的每个单词，看看它是否有适当的长度，然后看看它是否是“教科书”的排列。我从你那里借了排列支票但是稍微改变了一下

given_word = 'textbook'

with open('/usr/share/dict/words', 'r') as f:
    words = [s.strip() for s in f.readlines()]

matches = []
for word in words:
    if word != given_word and 4 <= len(word) <= len(given_word):
        if all(word.count(char) <= given_word.count(char) for char in word):
            matches.append(word)
print sorted(matches)

given_word='教科书'
将open（'/usr/share/dict/words，'r'）作为f：
words=[s.strip（）表示f.readlines（）中的s]
匹配项=[]
用文字表示：
如果是单词！=给定的单词和4对于较长的单词，排列变得非常大。举个反革命的例子
我会过滤听写中从4到len（单词）（8为教科书）的单词。
然后，我将使用正则表达式“oboe.matches”（[Bookbook]+）进行过滤
剩下的单词，我会进行排序，并将它们与您的单词的排序版本（“beoo”，“bekoottx”）进行比较，跳转到匹配字符的下一个索引，以查找不匹配的字符数：
("beoo", "bekoottx") 
("eoo", "ekoottx") 
("oo", "koottx") 
("oo", "oottx") 
("o", "ottx") 
("", "ttx") => matched


("bbo", "bekoottx") 
("bo", "ekoottx") => mismatch

因为我不讲python，所以我将实现作为练习留给观众
 在看到您的评论之前删除了我的答案，原因与您指出的相同。谢谢，通过对dict进行一些预处理并为每个字母指定一个素数表示，您可以非常高效地解决这个问题。如果我以后有时间，我会写一个解决方案。@Voo我会等待选择正确的答案，直到你提交解决方案。我很期待。这个问题似乎离题了，因为它是关于编程的难题（）还没有深入研究这么多，但此代码会给我不同的结果，执行时间要长20倍以上。请注意，我的版本只关心包含每个字母的每个匹配单词一次。第二个版本只返回每个字母不同的单词（在本例中：toke
），但tote
，oboe
，text
，boot
、take
、toot
、book
和betook
也是有效的解决方案。引用：“每个字母只能使用一次”。）@甘达罗很好，但教科书中有两个t
s，每个都可以使用一次；）酷，我以前从未听说过动力装置的概念。小挑剔，您当前的实现没有过滤掉Lengt4或更多的单词。非常清楚的解释和代码，谢谢。您的代码还返回了正确的单词kobe
、otto
和toto，现在我想知道为什么我的置换解决方案没有显示这些单词。@BioGeek您没有同等地处理大小写。没有lambda，没有map，没有过滤器：最后，这是Pythonic。虽然使用生成器理解而不是列表理解和累加循环应该更有效。@Evpok我可以理解（并同意）为什么映射和筛选在列表理解中是不必要的，但我不明白为什么创建几十个迷你函数而不是使用lambdas会是特别pythonic的？请参见：“[…]一旦映射（），过滤器（）
given_word = 'textbook'

with open('/usr/share/dict/words', 'r') as f:
    words = [s.strip() for s in f.readlines()]

matches = []
for word in words:
    if word != given_word and 4 <= len(word) <= len(given_word):
        if all(word.count(char) <= given_word.count(char) for char in word):
            matches.append(word)
print sorted(matches)

("beoo", "bekoottx") 
("eoo", "ekoottx") 
("oo", "koottx") 
("oo", "oottx") 
("o", "ottx") 
("", "ttx") => matched


("bbo", "bekoottx") 
("bo", "ekoottx") => mismatch