Python 优化包含所有给定短语的句子_Python_Arrays_String_Python 3.x_Algorithm

Python 优化包含所有给定短语的句子

python arrays string python-3.x algorithm

Python 优化包含所有给定短语的句子,python,arrays,string,python-3.x,algorithm,Python,Arrays,String,Python 3.x,Algorithm,我正在读Geeksforgek的文件。有一个问题，包含所有给定短语的句子详情如下: 给出一系列句子和短语。任务是找出哪个句子包含一个短语中的所有单词，并为每个短语打印包含给定短语的句子编号例如：输入：输出： Phrase1: 1 2 Phrase2: NONE 代码：这是正确的。但是我想知道如何优化答案以降低时间复杂度，或者只是让它运行得更快。我也在解决一个类似的问题，所以这就是我问的原因。如果你能帮我的话，非常感谢您可以使用集合模块中的计数器类，使逻辑简单得多： from col

我正在读Geeksforgek的文件。有一个问题，

包含所有给定短语的句子

详情如下: 给出一系列句子和短语。任务是找出哪个句子包含一个短语中的所有单词，并为每个短语打印包含给定短语的句子编号

例如：输入：

输出：

Phrase1:
1 2
Phrase2:
NONE

代码：

这是正确的。但是我想知道如何优化答案以降低时间复杂度，或者只是让它运行得更快。我也在解决一个类似的问题，所以这就是我问的原因。如果你能帮我的话，非常感谢

您可以使用

集合

模块中的

计数器

类，使逻辑简单得多：

from collections import Counter

def contains(sentence, phrase):
    return all(sentence[word] >= phrase[word] for word in phrase)

sent = ["Strings are an array of characters", 
        "Sentences are an array of words"] 
ph = ["an array of", "sentences are strings"]

sent = [Counter(word.lower() for word in sentence.split()) for sentence in sent]
ph   = [Counter(word.lower() for word in sentence.split()) for sentence in ph]

for i, phrase in enumerate(ph, start=1):
    print("Phrase{}:".format(i))
    matches = [j for j, sentence in enumerate(sent, start=1) if contains(sentence, phrase)]
    if not matches:
        print("NONE")
    else:
        print(*matches)

这使我们能够计算每个句子中每个单词的数量一次，而不是每个短语一次。

您可以使用

集合

模块中的

计数器

类，使您的逻辑更加简单：

from collections import Counter

def contains(sentence, phrase):
    return all(sentence[word] >= phrase[word] for word in phrase)

sent = ["Strings are an array of characters", 
        "Sentences are an array of words"] 
ph = ["an array of", "sentences are strings"]

sent = [Counter(word.lower() for word in sentence.split()) for sentence in sent]
ph   = [Counter(word.lower() for word in sentence.split()) for sentence in ph]

for i, phrase in enumerate(ph, start=1):
    print("Phrase{}:".format(i))
    matches = [j for j, sentence in enumerate(sent, start=1) if contains(sentence, phrase)]
    if not matches:
        print("NONE")
    else:
        print(*matches)

这允许我们计算每个句子中每个单词的数量一次，而不是每个短语一次。

您当前的算法大约运行在O（| sent |*| phrase |*k）中，k是一个句子中的平均单词量。Patrik的回答将k降低到一个短语中的平均单词量，在你的例子中应该少于10，因此这是一个很大的改进

改善最坏的情况可能是不可能的，但我们仍然可以改善平均情况。我们的想法是建立一个索引，将出现在句子中的所有单词作为键，并将该单词作为值的句子索引列表

这样，我们就可以检查一个给定的短语，每个单词有多少个句子，然后用较少的元素在列表上迭代。例如，如果你的短语有一个没有句子的单词，我们会避免完全重复该短语的句子

from collections import Counter
from collections import defaultdict

def containsQty(sentence, phrase):
    qty = 100000
    for word in phrase:
        qty = min(qty, int(sentence[word] / phrase[word]))
        if qty == 0:
            break
    return qty

sent = ["bob and alice like to text each other", "bob does not like to ski but does not like to fall", "alice likes to ski"] 
ph = ["bob alice", "alice", "like"]

sent = [Counter(word.lower() for word in sentence.split()) for sentence in sent]
ph   = [Counter(word.lower() for word in sentence.split()) for sentence in ph]

indexByWords = defaultdict(list)

for index, counter in enumerate(sent, start = 1):
    for word in counter.keys():
        indexByWords[word].append(index)


for i, phrase in enumerate(ph, start=1):
    print("Phrase{}:".format(i))

    best = None
    minQty = len(sent) + 1
    for word in phrase.keys():
        if minQty > len(indexByWords[word]):
            minQty = len(indexByWords[word])
            best = indexByWords[word]

    matched = False
    for index in best:
        qty = containsQty(sent[index - 1], phrase)
        if qty > 0:
            matched = True
            print((str(index) + ' ') * qty)
    if not matched:
        print("NONE")

您当前的算法大约运行在O（| sent |*| phrase |*k）中，k是一个句子中的平均字数。Patrik的回答将k降低到一个短语中的平均单词量，在你的例子中应该少于10，因此这是一个很大的改进

from collections import Counter
from collections import defaultdict

def containsQty(sentence, phrase):
    qty = 100000
    for word in phrase:
        qty = min(qty, int(sentence[word] / phrase[word]))
        if qty == 0:
            break
    return qty

sent = ["bob and alice like to text each other", "bob does not like to ski but does not like to fall", "alice likes to ski"] 
ph = ["bob alice", "alice", "like"]

sent = [Counter(word.lower() for word in sentence.split()) for sentence in sent]
ph   = [Counter(word.lower() for word in sentence.split()) for sentence in ph]

indexByWords = defaultdict(list)

for index, counter in enumerate(sent, start = 1):
    for word in counter.keys():
        indexByWords[word].append(index)


for i, phrase in enumerate(ph, start=1):
    print("Phrase{}:".format(i))

    best = None
    minQty = len(sent) + 1
    for word in phrase.keys():
        if minQty > len(indexByWords[word]):
            minQty = len(indexByWords[word])
            best = indexByWords[word]

    matched = False
    for index in best:
        qty = containsQty(sent[index - 1], phrase)
        if qty > 0:
            matched = True
            print((str(index) + ' ') * qty)
    if not matched:
        print("NONE")

我正试图用以下代码在O（n^2）中完成它：

import time
millis = int(round(time.time() * 1000))


sent = ["Strings are an array of characters",
        "Sentences are an array of words"]
ph = ["an array of","sentences are strings"]

s2 = [c.split() for c in ph]
s1=[d.split() for d in sent]
print(s2)
print(s1)

for i in s2:
    z=[]
    phcount=set(i)
    x = len(i)
    for idx1,j in enumerate(s1):
        sentcount=set(j)
        y = phcount.intersection(sentcount)
        if len(y)==x:
            z.append(idx1)
    if len(z)>0:
        print(z)
    else:
        print("NONE") 
millis2 = int(round(time.time() * 1000))          
print (millis2-millis)

我正试图用以下代码在O（n^2）中完成它：

import time
millis = int(round(time.time() * 1000))


sent = ["Strings are an array of characters",
        "Sentences are an array of words"]
ph = ["an array of","sentences are strings"]

s2 = [c.split() for c in ph]
s1=[d.split() for d in sent]
print(s2)
print(s1)

for i in s2:
    z=[]
    phcount=set(i)
    x = len(i)
    for idx1,j in enumerate(s1):
        sentcount=set(j)
        y = phcount.intersection(sentcount)
        if len(y)==x:
            z.append(idx1)
    if len(z)>0:
        print(z)
    else:
        print("NONE") 
millis2 = int(round(time.time() * 1000))          
print (millis2-millis)

秩序重要吗？句子

“字符串是句子”

是否包含

“句子是字符串”

？顺序无关紧要@patrickhaugh取决于您掌握的信息。有多少个句子和短语？他们能呆多久？你希望每个短语对应的平均句子数量是多少？假设一个句子短语中的单词数量少于10个。句子和短语的数量少于10000个。@juvianI认为你可以在顺序重要吗？句子

“字符串是句子”

是否包含

“句子是字符串”

？顺序无关紧要@patrickhaugh取决于您掌握的信息。有多少个句子和短语？他们能呆多久？你希望每个短语对应的平均句子数量是多少？假设一个句子短语中的单词数量少于10个。句子和短语的数量不到10000个。@juvianI认为你可以为这个问题找到更好的答案。我真的很感谢你的回答，看起来很棒。但它仍然没有通过一些测试代码。我不知道为什么：|@juvian@Robin比如？我没有使它不区分大小写，很遗憾，我看不到测试代码。事实上，我对此感到非常沮丧。它只是告诉我错误的答案：/@juvian@Robin有问题链接吗？这实际上是一个代码挑战问题。我不认为你能得到它。但是如果你能帮忙的话，我可以通过电子邮件给你发送问题的截图@朱维亚尼非常感谢你的回答，它看起来很棒。但它仍然没有通过一些测试代码。我不知道为什么：|@juvian@Robin比如？我没有使它不区分大小写，很遗憾，我看不到测试代码。事实上，我对此感到非常沮丧。它只是告诉我错误的答案：/@juvian@Robin有问题链接吗？这实际上是一个代码挑战问题。我不认为你能得到它。但是如果你能帮忙的话，我可以通过电子邮件给你发送问题的截图@朱维亚人