re.sub用相应的匹配项替换多个列表(Python)

re.sub用相应的匹配项替换多个列表(Python),python,Python,通过正则表达式,我想用相应的极性标记来注释给定句子的情感词汇,所以我编写了如下代码行 import re vocab = ['good/POSI','bad/NEAG','strong/POSI','dirty/NEGA', 'never/SWIT'] sent = ["It is really good", "strong man never gets his body dirty"] for token in vocab: word = re.sub(r'(\\w+)\\/[A-Z

通过正则表达式,我想用相应的极性标记来注释给定句子的情感词汇,所以我编写了如下代码行

import re
vocab = ['good/POSI','bad/NEAG','strong/POSI','dirty/NEGA', 'never/SWIT']
sent = ["It is really good", "strong man never gets his body dirty"]

for token in vocab:
    word = re.sub(r'(\\w+)\\/[A-Z]+_[A-Z]+','\\1', token)
    TA = re.sub(str(word),str(token), str(sent))
print(TA)
我试图得到这样的结果

["It is really good/POSI", "strong/POSI man never/SWIT gets his body dirty/NEGA"]
不幸的是,我不能,我也不知道哪些线路有问题。 有没有更好的注释方法?

我建议将词表改为字典:

>>> vocab = {v[:v.find('/')]: v for v in vocab}
>>> vocab
{'dirty': 'dirty/NEGA', 'good': 'good/POSI', 'never': 'never/SWIT', 'bad': 'bad/NEAG', 'strong': 'strong/POSI'}
通过这种方式,您可以使用字典中的值替换\w+:

result = []
for line in sent:
    line = re.sub(r'(\w+)', lambda w: vocab.get(w.group(), w.group()), line)
    result.append(line)
print(result)
这将输出您想要的内容:

['It is really good/POSI', 'strong/POSI man never/SWIT gets his body dirty/NEGA']

照目前的情况,这只适用于“never/SWIT”,因为对于内部循环的每次迭代,您都从未修改的行开始。