Python:将字典值中的短语与句子(字典键)匹配,并根据匹配结果输出
我有一本字典,其中每个键都是一个句子,值是该句子中的特定单词或短语 例如:Python:将字典值中的短语与句子(字典键)匹配,并根据匹配结果输出,python,Python,我有一本字典,其中每个键都是一个句子,值是该句子中的特定单词或短语 例如: dict1 = {'it is lovely weather and it is kind of warm':['lovely weather', 'it is kind of warm'],'and the weather is rainy and cold':['rainy and cold'],'the temperature is ok':['temperature']} 我希望根据词组是否在字典值中对输出的每
dict1 = {'it is lovely weather and it is kind of warm':['lovely weather', 'it is kind of warm'],'and the weather is rainy and cold':['rainy and cold'],'the temperature is ok':['temperature']}
我希望根据词组是否在字典值中对输出的每个句子进行标记
在本例中,输出为(其中0不在值中,1在值中)
我可以通过硬编码短语中的字数来实现类似的功能:
for k,v in dict1.items():
words_in_val = v.split()
if len(words_in_val) == 1:
words = k.split()
for each_word in words:
if v == each_word:
print(each_word + '\t' + '1')
else:
print(each_word + '\t' + '0')
if len(words_in_val) == 2::
words = k.split()
for index,item in enumerate(words[:-1]):
if words[index] == words_in_val[0]:
if words[index+1] == words_in_val[1]:
words[index] = ' '.join(words_in_val)
words.remove(words[index+1])
....something like this...
我的问题是,我可以看到它开始变得混乱,而且理论上,我可以在我想要匹配的短语中有无限数量的单词,尽管它通常是,所以我会这样做:
from collections import defaultdict
dict1 = {'it is lovely weather and it is kind of warm':['it is kind of', 'it is kind'],'and the weather is rainy and cold':['rainy and cold'],'the temperature is ok':['temperature']}
def tag_sentences(dict):
id = 1
tagged_results = []
for sentence, phrases in dict.items():
words = sentence.split()
phrases_split = [phrase.split() for phrase in phrases]
positions_keeper = {}
sentence_results = [(word, 0) for word in words]
for word_index, word in enumerate(words):
for index, phrase in enumerate(phrases_split):
position = positions_keeper.get(index, 0)
if phrase[position] == word:
if len(phrase) > position + 1:
positions_keeper[index] = position + 1
else:
for i in range(len(phrase)):
sentence_results[word_index - i] = (sentence_results[word_index - i][0], id)
id = id + 1
else:
positions_keeper[index] = 0
tagged_results.append(sentence_results)
return tagged_results
def print_tagged_results(tagged_results):
for tagged_result in tagged_results:
memory = 0
memory_sentence = ""
for result, id in tagged_result:
if memory != 0 and memory != id:
print(memory_sentence + "1")
memory_sentence = ""
if id == 0:
print(result, 0)
else:
memory_sentence += result + " "
memory = id
if memory != 0:
print(memory_sentence + "1")
tagged_results = tag_sentences(dict1)
print_tagged_results(tagged_results)
这基本上是在做以下工作:
[(it,0),(is,0),(可爱的,0)…]
[(it,0),(is,0),(可爱的,1)…(kind,2),(of,2),…]
如果一个短语是另一个短语的子短语,它将不起作用,但您从未在示例中提到它应如何应对这种情况。因此,我将这样做:
from collections import defaultdict
dict1 = {'it is lovely weather and it is kind of warm':['it is kind of', 'it is kind'],'and the weather is rainy and cold':['rainy and cold'],'the temperature is ok':['temperature']}
def tag_sentences(dict):
id = 1
tagged_results = []
for sentence, phrases in dict.items():
words = sentence.split()
phrases_split = [phrase.split() for phrase in phrases]
positions_keeper = {}
sentence_results = [(word, 0) for word in words]
for word_index, word in enumerate(words):
for index, phrase in enumerate(phrases_split):
position = positions_keeper.get(index, 0)
if phrase[position] == word:
if len(phrase) > position + 1:
positions_keeper[index] = position + 1
else:
for i in range(len(phrase)):
sentence_results[word_index - i] = (sentence_results[word_index - i][0], id)
id = id + 1
else:
positions_keeper[index] = 0
tagged_results.append(sentence_results)
return tagged_results
def print_tagged_results(tagged_results):
for tagged_result in tagged_results:
memory = 0
memory_sentence = ""
for result, id in tagged_result:
if memory != 0 and memory != id:
print(memory_sentence + "1")
memory_sentence = ""
if id == 0:
print(result, 0)
else:
memory_sentence += result + " "
memory = id
if memory != 0:
print(memory_sentence + "1")
tagged_results = tag_sentences(dict1)
print_tagged_results(tagged_results)
这基本上是在做以下工作:
[(it,0),(is,0),(可爱的,0)…]
[(it,0),(is,0),(可爱的,1)…(kind,2),(of,2),…]
如果一个短语是另一个短语的子短语,这是行不通的,但您从未在示例中提到过它应该如何应对这种情况。这个问题是否因为太模糊而被解决了?谢谢,我没有意识到这会如此困难,我想我只是在努力将上面的内容变成一个循环。“作为一个概念单元站在一起的一小群单词,通常构成一个从句的一个组成部分。”。你必须让程序理解什么是“概念单元”“是的,我觉得这很难。嗯,我不这么认为,他不是已经拥有字典里所有的短语和单词了吗?”?因此,这是一个相当大的查找。问题是他用不同的长度做了所有这些,这可以在一段时间内解决(我只在脑海中思考),但我认为没有必要让程序理解短语本达尔这更像我所想的,这更像是将循环更改为类似“将短语按值分割……然后根据该短语中单词的长度……将关键字/句子分成相同长度的重叠块,并检查它们是否相等”(我只知道理论上如何做,不确定在现实生活中实际如何做).这个问题是否因为太模糊而结束了?谢谢,我没有意识到这会如此困难,我以为我只是在努力把上面的问题变成一个循环。“一小群词作为一个概念单元站在一起,通常构成一个从句的一个组成部分。”。你必须让程序理解什么是“概念单位”,我觉得这很难。嗯,我不这么认为,他不是已经掌握了字典中所有的短语和单词吗?因此,这是一个相当大的查找。问题是他用不同的长度做了所有这些,这可以在一段时间内解决(我只在脑海中思考),但我认为没有必要让程序理解短语本达尔这更像我所想的,这更像是将循环更改为类似“将短语按值分割……然后根据该短语中单词的长度……将关键字/句子分割为相同长度的重叠块,并检查它们是否相等”(我只知道理论上如何做,不确定在实际生活中实际如何做)。