Python 将否定词(如don&x27;t或never)和标点符号之间的所有单词标记为否定词

Python 将否定词(如don&x27;t或never)和标点符号之间的所有单词标记为否定词,python,nlp,text-mining,regex-negation,sentiment-analysis,Python,Nlp,Text Mining,Regex Negation,Sentiment Analysis,我正在尝试构建一个正则表达式匹配替换例程,它将接受出现在否定词和标点符号之间的所有单词,并为它们添加一个_NEG后缀 例如: 文本:我不想去那里:那可能很危险。 输出:我不想去那里,这可能很危险 我几乎什么都试过了,但都失败了。下面是我正在尝试的代码的快照: regex1 = "(never|nothing|nowhere|noone|none|not|havent|hasnt|hadnt|cant|couldnt|shouldnt|wont|wouldnt|dont|doesnt|didnt|

我正在尝试构建一个正则表达式匹配替换例程,它将接受出现在否定词和标点符号之间的所有单词,并为它们添加一个_NEG后缀

例如:

文本:我不想去那里:那可能很危险。 输出:我不想去那里,这可能很危险

我几乎什么都试过了,但都失败了。下面是我正在尝试的代码的快照:

regex1 = "(never|nothing|nowhere|noone|none|not|havent|hasnt|hadnt|cant|couldnt|shouldnt|wont|wouldnt|dont|doesnt|didnt|isnt|arent|aint|n't)(.*)[.:;!?]"                    
regcom = re.compile(regex1)
def tag(text):
    negative = []
    matching = regcom.findall(text)
    if len(matching)==0:
        return(text)
    matching = list(matching[0])
    matching = matching [0] + " " + matching [1]
    matching = matching .split()
    for neg in matching :
        negative.append(neg)
    for neg in negative:
        text = re.sub(neg + '(?!_NEG)', neg + '_NEG ', text)
    return text
使用Text=“我不想去那里:这可能很危险”尝试上面的代码 它只能部分起作用。如果我将其应用于一般文本,它也会给我带来许多逻辑和语法错误。
任何帮助都将不胜感激。

p.S.:我一直在引用Chris Pott关于情绪分析中否定处理的工作。。。这里可能有一个答案:你说“它只起部分作用”。让我们看看输出如何。嗨,罗伯特,我的代码的一点是,它会用_NEG标记出现在整个句子中以及我感兴趣的否定区域之间的所有单词。例如,如果句子是“我不想吃它:它可能会引起过敏”,“它”在句子中出现的任何地方都会被标记为_NEG。那不是我想要的。我希望“it”只有在否定区域(在否定词和标点符号之间)出现时才被标记为_NEG。希望有帮助。
def tag_words(sentence):
    import re
    # up to punctuation as in punct, put tags for words
    # following a negative word
    # find punctuation in the sentence
    punct = re.findall(r'[.:;!?]',sentence)[0]
    # create word set from sentence
    wordSet = { x for x in re.split("[.:;!?, ]",sentence) if x }
    keywordSet = {"don't","never", "nothing", "nowhere", "noone", "none", "not",
                  "hasn't","hadn't","can't","couldn't","shouldn't","won't",
                  "wouldn't","don't","doesn't","didn't","isn't","aren't","ain't"}
    # find negative words in sentence
    neg_words = wordSet & keywordSet
    if neg_words:
        for word in neg_words:
            start_to_w = sentence[:sentence.find(word)+len(word)]
            # put tags to words after the negative word
            w_to_punct =  re.sub(r'\b([A-Za-z\']+)\b',r'\1_NEG',
                               sentence[sentence.find(word)+len(word):sentence.find(punct)])
            punct_to_end = sentence[sentence.find(punct):]
            print(start_to_w + w_to_punct + punct_to_end)
    else:
        print("no negative words found ...")


s1 = "I don't want to go there: it might be dangerous"
tag_words(s1)
# I don't want_NEG to_NEG go_NEG there_NEG: it might be dangerous
s2 = "I want never to go there: it might be dangerous"
tag_words(s2)
# I want never to_NEG go_NEG there_NEG: it might be dangerous
tag_words(s3)
s3 = "I couldn't to go there! it might be dangerous"
# I couldn't to_NEG go_NEG there_NEG! it might be dangerous