Python 将否定词(如don&x27;t或never)和标点符号之间的所有单词标记为否定词
我正在尝试构建一个正则表达式匹配替换例程,它将接受出现在否定词和标点符号之间的所有单词,并为它们添加一个_NEG后缀 例如: 文本:我不想去那里:那可能很危险。 输出:我不想去那里,这可能很危险 我几乎什么都试过了,但都失败了。下面是我正在尝试的代码的快照:Python 将否定词(如don&x27;t或never)和标点符号之间的所有单词标记为否定词,python,nlp,text-mining,regex-negation,sentiment-analysis,Python,Nlp,Text Mining,Regex Negation,Sentiment Analysis,我正在尝试构建一个正则表达式匹配替换例程,它将接受出现在否定词和标点符号之间的所有单词,并为它们添加一个_NEG后缀 例如: 文本:我不想去那里:那可能很危险。 输出:我不想去那里,这可能很危险 我几乎什么都试过了,但都失败了。下面是我正在尝试的代码的快照: regex1 = "(never|nothing|nowhere|noone|none|not|havent|hasnt|hadnt|cant|couldnt|shouldnt|wont|wouldnt|dont|doesnt|didnt|
regex1 = "(never|nothing|nowhere|noone|none|not|havent|hasnt|hadnt|cant|couldnt|shouldnt|wont|wouldnt|dont|doesnt|didnt|isnt|arent|aint|n't)(.*)[.:;!?]"
regcom = re.compile(regex1)
def tag(text):
negative = []
matching = regcom.findall(text)
if len(matching)==0:
return(text)
matching = list(matching[0])
matching = matching [0] + " " + matching [1]
matching = matching .split()
for neg in matching :
negative.append(neg)
for neg in negative:
text = re.sub(neg + '(?!_NEG)', neg + '_NEG ', text)
return text
使用Text=“我不想去那里:这可能很危险”尝试上面的代码
它只能部分起作用。如果我将其应用于一般文本,它也会给我带来许多逻辑和语法错误。
任何帮助都将不胜感激。p.S.:我一直在引用Chris Pott关于情绪分析中否定处理的工作。。。这里可能有一个答案:你说“它只起部分作用”。让我们看看输出如何。嗨,罗伯特,我的代码的一点是,它会用_NEG标记出现在整个句子中以及我感兴趣的否定区域之间的所有单词。例如,如果句子是“我不想吃它:它可能会引起过敏”,“它”在句子中出现的任何地方都会被标记为_NEG。那不是我想要的。我希望“it”只有在否定区域(在否定词和标点符号之间)出现时才被标记为_NEG。希望有帮助。
def tag_words(sentence):
import re
# up to punctuation as in punct, put tags for words
# following a negative word
# find punctuation in the sentence
punct = re.findall(r'[.:;!?]',sentence)[0]
# create word set from sentence
wordSet = { x for x in re.split("[.:;!?, ]",sentence) if x }
keywordSet = {"don't","never", "nothing", "nowhere", "noone", "none", "not",
"hasn't","hadn't","can't","couldn't","shouldn't","won't",
"wouldn't","don't","doesn't","didn't","isn't","aren't","ain't"}
# find negative words in sentence
neg_words = wordSet & keywordSet
if neg_words:
for word in neg_words:
start_to_w = sentence[:sentence.find(word)+len(word)]
# put tags to words after the negative word
w_to_punct = re.sub(r'\b([A-Za-z\']+)\b',r'\1_NEG',
sentence[sentence.find(word)+len(word):sentence.find(punct)])
punct_to_end = sentence[sentence.find(punct):]
print(start_to_w + w_to_punct + punct_to_end)
else:
print("no negative words found ...")
s1 = "I don't want to go there: it might be dangerous"
tag_words(s1)
# I don't want_NEG to_NEG go_NEG there_NEG: it might be dangerous
s2 = "I want never to go there: it might be dangerous"
tag_words(s2)
# I want never to_NEG go_NEG there_NEG: it might be dangerous
tag_words(s3)
s3 = "I couldn't to go there! it might be dangerous"
# I couldn't to_NEG go_NEG there_NEG! it might be dangerous