re.sub用相应的匹配项替换多个列表(Python)
通过正则表达式,我想用相应的极性标记来注释给定句子的情感词汇,所以我编写了如下代码行re.sub用相应的匹配项替换多个列表(Python),python,Python,通过正则表达式,我想用相应的极性标记来注释给定句子的情感词汇,所以我编写了如下代码行 import re vocab = ['good/POSI','bad/NEAG','strong/POSI','dirty/NEGA', 'never/SWIT'] sent = ["It is really good", "strong man never gets his body dirty"] for token in vocab: word = re.sub(r'(\\w+)\\/[A-Z
import re
vocab = ['good/POSI','bad/NEAG','strong/POSI','dirty/NEGA', 'never/SWIT']
sent = ["It is really good", "strong man never gets his body dirty"]
for token in vocab:
word = re.sub(r'(\\w+)\\/[A-Z]+_[A-Z]+','\\1', token)
TA = re.sub(str(word),str(token), str(sent))
print(TA)
我试图得到这样的结果
["It is really good/POSI", "strong/POSI man never/SWIT gets his body dirty/NEGA"]
不幸的是,我不能,我也不知道哪些线路有问题。
有没有更好的注释方法?我建议将词表改为字典:
>>> vocab = {v[:v.find('/')]: v for v in vocab}
>>> vocab
{'dirty': 'dirty/NEGA', 'good': 'good/POSI', 'never': 'never/SWIT', 'bad': 'bad/NEAG', 'strong': 'strong/POSI'}
通过这种方式,您可以使用字典中的值替换\w+:
result = []
for line in sent:
line = re.sub(r'(\w+)', lambda w: vocab.get(w.group(), w.group()), line)
result.append(line)
print(result)
这将输出您想要的内容:
['It is really good/POSI', 'strong/POSI man never/SWIT gets his body dirty/NEGA']
照目前的情况,这只适用于“never/SWIT”,因为对于内部循环的每次迭代,您都从未修改的行开始。