用键中的空格替换Python字典_Python_Algorithm_Dictionary_Replace_Textmatching

用键中的空格替换Python字典

python algorithm dictionary replace

用键中的空格替换Python字典,python,algorithm,dictionary,replace,textmatching,Python,Algorithm,Dictionary,Replace,Textmatching,我有一个字符串和一本字典，我必须替换该文本中出现的每一个dict键 text = 'I have a smartphone and a Smart TV' dict = { 'smartphone': 'toy', 'smart tv': 'junk' } 如果键中没有空格，我会将文本分解成单词，并与dict逐一比较。看起来它花了O（n）。但现在钥匙里面有空间，所以事情更复杂了。请建议我这样做的好方法，请注意，关键可能与文本不匹配的情况更新我已经想到了这个解决办法，但它并不

我有一个字符串和一本字典，我必须替换该文本中出现的每一个dict键

text = 'I have a smartphone and a Smart TV'
dict = {
    'smartphone': 'toy',
    'smart tv': 'junk'
}

如果键中没有空格，我会将文本分解成单词，并与dict逐一比较。看起来它花了O（n）。但现在钥匙里面有空间，所以事情更复杂了。请建议我这样做的好方法，请注意，关键可能与文本不匹配的情况
更新
我已经想到了这个解决办法，但它并不有效。O（m*n）或更多

for k,v in dict.iteritems(): text = text.replace(k,v) #or regex...

如果密钥没有空格：

output = [dct[i] if i in dct else i for i in text.split()] ' '.join(output)
您应该使用dct而不是dict，这样它就不会与内置函数dict（）冲突
这使用了，和过滤数据
如果您的密钥确实有空格，则您是正确的：

for k,v in dct.iteritems(): string.replace('d', dct[d])

是的，这个时间复杂度将是m*n，因为在dct中，每次都必须对每个键遍历字符串。
将所有字典键和输入文本放在小写，这样比较就很容易了。现在

for entry in my_dict: if entry in text: # process the match
这假设字典足够小，可以保证匹配。相反，如果字典很大而文本很小，你需要记下每个单词，然后是每个两个单词的短语，看看它们是否在字典里

这足以让您继续吗？
您需要测试从1（每个单词）到len（文本）（整个字符串）的所有相邻排列。您可以通过以下方式生成相邻排列：

text = 'I have a smartphone and a Smart TV' array = text.lower().split() key_permutations = [" ".join(array[j:j + i]) for i in range(1, len(array) + 1) for j in range(0, len(array) - (i - 1))] >>> key_permutations ['i', 'have', 'a', 'smartphone', 'and', 'a', 'smart', 'tv', 'i have', 'have a', 'a smartphone', 'smartphone and', 'and a', 'a smart', 'smart tv', 'i have a', 'have a smartphone', 'a smartphone and', 'smartphone and a', 'and a smart', 'a smart tv', 'i have a smartphone', 'have a smartphone and', 'a smartphone and a', 'smartphone and a smart', 'and a smart tv', 'i have a smartphone and', 'have a smartphone and a', 'a smartphone and a smart', 'smartphone and a smart tv', 'i have a smartphone and a', 'have a smartphone and a smart', 'a smartphone and a smart tv', 'i have a smartphone and a smart', 'have a smartphone and a smart tv', 'i have a smartphone and a smart tv']
现在我们通过字典替换：

import re for permutation in key_permutations: if permutation in dict: text = re.sub(re.escape(permutation), dict[permutation], text, flags=re.IGNORECASE) >>> text 'I have a toy and a junk'

尽管您可能希望尝试按相反顺序排列，最长优先，因此更具体的短语优先于单个单词。
如果文本中的关键字彼此不接近（关键字其他关键字），我们可以这样做。Take O（n）to me>“使用正则表达式可以很容易地实现这一点

import re text = 'I have a smartphone and a Smart TV' dict = { 'smartphone': 'toy', 'smart tv': 'junk' } for k, v in dict.iteritems(): regex = re.compile(re.escape(k), flags=re.I) text = regex.sub(v, text)

如果一个项目的替换值是另一个项目的搜索项的一部分，那么它仍然会遇到依赖于dict键的处理顺序的问题。
键有空间，因此如果dict有类似my_dict={“google”：“yahoo”，“yahoo”：“google”}和text的内容，则无法拆分字符串替换将失败。”google比yahoo大“dict可能有3个单词，4个单词…谁知道呢。而且你的算法效率不高。我相信有界字数是O（n）。如果它只受输入长度的限制，那么它是O（n^2）--但是给输入的短语加上标点符号，n也相当有限。这对你的应用程序来说容易处理吗？如果文本中的条目比O（n）更难比较，而我的dict中的条目又需要O（m）那么O（n*m）：O你能解释一下复杂性吗？对我来说这看起来很复杂。O（m^n）可能是@。@string replace将失败，如果dict有这样的内容，我的dict={“google”：“yahoo”，“yahoo”：“google”}和文本“google比yahoo大”，正如我在回答中指出的那样
import re text = 'I have a smartphone and a Smart TV' dict = { 'smartphone': 'toy', 'smart tv': 'junk' } for k, v in dict.iteritems(): regex = re.compile(re.escape(k), flags=re.I) text = regex.sub(v, text)