String 查找构成单词的所有单词组合_String_Algorithm_Combinations_Dynamic Programming

String 查找构成单词的所有单词组合

string algorithm

String 查找构成单词的所有单词组合,string,algorithm,combinations,dynamic-programming,String,Algorithm,Combinations,Dynamic Programming,我有一个单词列表，一些单词可以用两个或更多其他单词组成，我必须返回所有这样的组合输入： words=[“leetcode”、“leet”、“code”、“le”、“et”、“etcode”、“de”、“decode”、“deet”] 输出：（“leet”、“code”）（“le”、“et”、“code”）（“de”、“code”）等我尝试的是： 1）尝试所有可能的组合需要花费太多的时间，这是一个坏主意 2）我在这里感觉到某种形式的动态编程，比如我可以在“leetcode”中使用“le

我有一个单词列表，一些单词可以用两个或更多其他单词组成，我必须返回所有这样的组合

输入：

words=[“leetcode”、“leet”、“code”、“le”、“et”、“etcode”、“de”、“decode”、“deet”]

输出：

（“leet”、“code”）（“le”、“et”、“code”）（“de”、“code”）等

我尝试的是：

1）尝试所有可能的组合需要花费太多的时间，这是一个坏主意

2）我在这里感觉到某种形式的动态编程，比如我可以在“leetcode”中使用“leet”的解决方案。但我无法用psuedocode精确地表达它。我该怎么做？

简单的方法：
对单词列表进行排序。
对于每个单词A（

leetcode

），通过二进制搜索查找作为单词A（'le'，

leet

）前缀的单词范围。

对于每个有效的前缀，重复搜索单词的其余部分（即，查找

etcode

和

code

），等等

每个单词可以只使用一次吗？例如，如果您有

de

和

dede

，那么（

de

，

de

）是答案吗？为了简单起见，我假设每个单词只出现一次，您有很多单词，但没有内存限制

1-构建自定义树，使每个节点如下所示：

class node():
   is_there_a_word_that_ends_here = T/F
   children = dict() # other nodes, key is the letter of the node

例如，如果你有三个单词，比如[“ab”，“abc”，“ade”，“c”]，那么你会有一棵树，如下所示（我在这里放了一个*符号，如果这里有一个单词，那么这里结束，节点的值为真）

2-根据单词的长度对单词进行分组。从最小长度的单词开始，因为当你接触到较大的单词时，你想知道较小单词的“分类”。在这里，您可以通过一个函数递归地执行此操作，比如说

将单词添加到结果中

，该函数可以（应该）缓存结果

results = dict() # keys: possible words you can reach, values: ways to reach them
for word in words_in_ascending_length:
   add_word_to_result(word, tree, results)

而

将单词添加到结果中

将开始在树中移动。如果它在一个节点中看到

是否存在\u结尾的\u单词\u

，它将调用

将单词添加到\u结果（单词、树、结果的剩余部分）

。例如，如果你有“abc”，那么你会在“ab”中看到*然后调用

将单词添加到结果（“c”，树，结果）

实现递归函数是问题的“有趣的部分”（也是更耗时的部分），所以我把这个问题留给您。另外，作为奖励，您可以想出一种方法来避免以有效的方式向结果中添加重复项（因为在某些情况下会发生重复项）

（编辑：如果这句话有意义的话，你可能需要分别缓存现有单词和不存在单词的分解，例如单词结尾的分解，这样你就不必在返回结果之前将它们分开）

我希望这有帮助

奖励：示例代码（虽然还没有真正测试过，但应该可以使用，而且有一个显著的改进，但是我现在懒得去做。您可以稍微更改一下结构，将

结果传递到将单词添加到结果中
，这样您就可以记住到目前为止所有可能的组合，而不是将单词添加到结果中（head，head，words\u left[1:]，组合，words\u passed+words\u left[0]+“，”）
，您只需使用它，而不必执行不必要的递归）
trie的目的是什么？集合不是更好吗？它会有O（1）查找时间。如果我们假设散列是即时的，是的。但是我假设散列会使用每个单词中的每个字符。Tree也会这样做，所以我看不出在这里使用集合有什么显著的优势，但我不是专家，所以如果有任何反对意见，我很想听到。我使用Tree的原因是，你可以很容易地找到具有与您感兴趣的单词的开头相同。在集合中，相似的单词之间没有任何联系，因此您需要搜索，或找到另一个索引。@AjinkyaGawali请参阅编辑：）如果您可以将leetcode链接发送给我，我想检查是否遗漏了任何内容
results = dict() # keys: possible words you can reach, values: ways to reach them
for word in words_in_ascending_length:
   add_word_to_result(word, tree, results)

words = ["leetcode", "leet", "code", "le", "et", "etcode", "de", "decode", "deet"]


class node():
    def __init__(self, letter, is_there_a_word_that_ends_here):
        self.letter = letter # not really used but it feels weird to not have it in class
        self.is_there_a_word_that_ends_here = is_there_a_word_that_ends_here
        self.children = dict()


# actually defining tree is redundant you can just merge tree and node class together, but maybe this is more explicit
class Tree():
    def __init__(self):
        self.head = node(None, False)

    def add(self, word, head=None):
        if head is None:
            head=self.head

        if word[0] not in head.children.keys():
            head.children[word[0]] = node(word[0], False)

        if len(word) == 1:
            head.children[word[0]].is_there_a_word_that_ends_here = True
        else:
            self.add(word[1:], head=head.children[word[0]])


words = sorted(words, key=lambda w: len(w))
results = dict()
tree = Tree()
for word in words:
    tree.add(word)


def add_word_to_result(head, current_node, words_left, combinations, words_passed):
    if words_left[0] in current_node.children.keys():
        # this does not have to happen because we call this function with words that are not in the list as well
        next_node = current_node.children[words_left[0]]
        if len(words_left) == 1 and next_node.is_there_a_word_that_ends_here:
            combinations.append(words_passed+words_left)
        elif next_node.is_there_a_word_that_ends_here:
            add_word_to_result(head, head, words_left[1:], combinations, words_passed+words_left[0]+",")
            add_word_to_result(head, next_node, words_left[1:], combinations, words_passed + words_left[0])
        else:
            add_word_to_result(head, next_node, words_left[1:], combinations, words_passed+words_left[0])


for word in words:
    results[word] = []
    add_word_to_result(tree.head, tree.head, word, results[word], "")

print(results)

# {'le': ['le'], 'et': ['et'], 'de': ['de'], 'leet': ['le,et', 'leet'], 'code': ['code'], 'deet': ['de,et', 'deet'], 'etcode': ['et,code', 'etcode'], 'decode': ['de,code', 'decode'], 'leetcode': ['le,et,code', 'le,etcode', 'leet,code', 'leetcode']}