在Python中进行基于方面的情绪分析时，需要关于否定处理的建议吗_Python_Nlp_Nltk_Stanford Nlp_Spacy

在Python中进行基于方面的情绪分析时，需要关于否定处理的建议吗

python nlp stanford-nlp

在Python中进行基于方面的情绪分析时，需要关于否定处理的建议吗,python,nlp,nltk,stanford-nlp,spacy,Python,Nlp,Nltk,Stanford Nlp,Spacy,我正在尝试编写一个Python代码，使用依赖关系解析器对产品评论进行基于方面的情绪分析。我创建了一个示例评论： “音质很好，但电池寿命不好。” 输出为：[['soundquality'、['great']]、['batterylife'、['bad']]] 我可以用这个句子正确地理解体和它的形容词，但是当我将文本改为： “音质不太好，但电池寿命也不错。” 输出仍然保持不变。如何向代码中添加否定处理？有没有办法改善我目前的状况 import pandas as pd import numpy as

我正在尝试编写一个Python代码，使用依赖关系解析器对产品评论进行基于方面的情绪分析。我创建了一个示例评论：

“音质很好，但电池寿命不好。”

输出为：[['soundquality'、['great']]、['batterylife'、['bad']]]

我可以用这个句子正确地理解体和它的形容词，但是当我将文本改为：

“音质不太好，但电池寿命也不错。”

输出仍然保持不变。如何向代码中添加否定处理？有没有办法改善我目前的状况

import pandas as pd
import numpy as np
import nltk
from nltk.corpus import stopwords
from nltk.corpus import wordnet
from nltk.stem.wordnet import WordNetLemmatizer
import stanfordnlp

stanfordnlp.download('en')
nltk.download('stopwords')
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

txt = "The Sound Quality is not great but the battery life is not bad."

txt = txt.lower()
sentList = nltk.sent_tokenize(txt)

taggedList = []
for line in sentList:
    txt_list = nltk.word_tokenize(line) # tokenize sentence
    taggedList = taggedList + nltk.pos_tag(txt_list) # perform POS-Tagging
print(taggedList)

newwordList = []
flag = 0
for i in range(0,len(taggedList)-1):
    if(taggedList[i][1]=='NN' and taggedList[i+1][1]=='NN'):
        newwordList.append(taggedList[i][0]+taggedList[i+1][0])
        flag=1
    else:
        if(flag == 1):
            flag=0
            continue
        newwordList.append(taggedList[i][0])
        if(i==len(taggedList)-2):
            newwordList.append(taggedList[i+1][0])
finaltxt = ' '.join(word for word in newwordList)
print(finaltxt)

stop_words = set(stopwords.words('english'))
new_txt_list = nltk.word_tokenize(finaltxt)
wordsList = [w for w in new_txt_list if not w in stop_words]
taggedList = nltk.pos_tag(wordsList)

nlp = stanfordnlp.Pipeline()
doc = nlp(finaltxt)
dep_node = []
for dep_edge in doc.sentences[0].dependencies:
    dep_node.append([dep_edge[2].text, dep_edge[0].index, dep_edge[1]])
for i in range(0, len(dep_node)):
    if(int(dep_node[i][1]) != 0):
        dep_node[i][1] = newwordList[(int(dep_node[i][1]) - 1)]
print(dep_node)

featureList = []
categories = []
totalfeatureList = []
for i in taggedList:
    if(i[1]=='JJ' or i[1]=='NN' or i[1]=='JJR' or i[1]=='NNS' or i[1]=='RB'):
        featureList.append(list(i))
        totalfeatureList.append(list(i)) # stores all the features for every sentence
        categories.append(i[0])
print(featureList)
print(categories)

fcluster = []
for i in featureList:
    filist = []
    for j in dep_node:
        if((j[0]==i[0] or j[1]==i[0]) and (j[2] in ["nsubj", "acl:relcl", "obj", "dobj", "agent", "advmod", "amod", "neg", "prep_of", "acomp", "xcomp", "compound"])):
            if(j[0]==i[0]):
                filist.append(j[1])
            else:
                filist.append(j[0])
    fcluster.append([i[0], filist])
print(fcluster)

finalcluster = []
dic = {}
for i in featureList:
    dic[i[0]] = i[1]
for i in fcluster:
    if(dic[i[0]]=='NN'):
        finalcluster.append(i)
print(finalcluster)

您可能希望尝试

spacy

。以下模式将适用：

名词短语
后跟
```
is
```
或
```
are
```
可选后跟
```
not
```
后跟形容词

或者，您可以使用依赖项解析器提供的信息扩展匹配模式，该信息将添加形容词短语的定义：

output=[]
matcher=matcher（nlp.vocab，validate=True）
add（“mood”，None，[{“LOWER”：{“IN”：[“is”，“are”]}，{“LOWER”：{“IN”：[“no”，“not”]}，“OP”：“}，{“DEP”：“advmod”，“OP”：“？”}，{“DEP”：“acomp”}]）
对于doc.noun_块中的nc：
d=doc[nc.root.right\u edge.i+1:nc.root.right\u edge.i+1+3]
匹配=匹配器（d）
如果匹配：
_，开始，结束=匹配项[0]
output.append（（nc.text，d[start+1:end].text））
打印（输出）
[（“产品”，“非常好”）]

我可以直接在我自己的代码中实现这一点，还是应该修改代码块，使其适用于我自己的模型？这是spacy，与您的nltk略有不同，但产生了预期的结果。我相信不是，只有匹配的形态序列当我尝试这句话时：“产品非常好。”它无法识别它，要解决这个问题，我需要在“否”和“否”的部分添加“非常”吗？要从输出中删除“是”，我应该怎么做？

import spacy
from spacy.matcher import Matcher
nlp = spacy.load('en_core_web_sm')

output = []
doc = nlp('The product is very good')
matcher = Matcher(nlp.vocab)
matcher.add("mood",None,[{"LOWER":{"IN":["is","are"]}},{"LOWER":{"IN":["no","not"]},"OP":"?"},{"LOWER":"very","OP":"?"},{"POS":"ADJ"}])
for nc in doc.noun_chunks:
    d = doc[nc.root.right_edge.i+1:nc.root.right_edge.i+1+3]
    matches = matcher(d)
    if matches:
        _, start, end = matches[0]
        output.append((nc.text, d[start+1:end].text))
    
print(output)
[('The product', 'very good')]