在Python中进行基于方面的情绪分析时,需要关于否定处理的建议吗
我正在尝试编写一个Python代码,使用依赖关系解析器对产品评论进行基于方面的情绪分析。我创建了一个示例评论: “音质很好,但电池寿命不好。” 输出为:[['soundquality'、['great']]、['batterylife'、['bad']]] 我可以用这个句子正确地理解体和它的形容词,但是当我将文本改为: “音质不太好,但电池寿命也不错。” 输出仍然保持不变。如何向代码中添加否定处理?有没有办法改善我目前的状况在Python中进行基于方面的情绪分析时,需要关于否定处理的建议吗,python,nlp,nltk,stanford-nlp,spacy,Python,Nlp,Nltk,Stanford Nlp,Spacy,我正在尝试编写一个Python代码,使用依赖关系解析器对产品评论进行基于方面的情绪分析。我创建了一个示例评论: “音质很好,但电池寿命不好。” 输出为:[['soundquality'、['great']]、['batterylife'、['bad']]] 我可以用这个句子正确地理解体和它的形容词,但是当我将文本改为: “音质不太好,但电池寿命也不错。” 输出仍然保持不变。如何向代码中添加否定处理?有没有办法改善我目前的状况 import pandas as pd import numpy as
import pandas as pd
import numpy as np
import nltk
from nltk.corpus import stopwords
from nltk.corpus import wordnet
from nltk.stem.wordnet import WordNetLemmatizer
import stanfordnlp
stanfordnlp.download('en')
nltk.download('stopwords')
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
txt = "The Sound Quality is not great but the battery life is not bad."
txt = txt.lower()
sentList = nltk.sent_tokenize(txt)
taggedList = []
for line in sentList:
txt_list = nltk.word_tokenize(line) # tokenize sentence
taggedList = taggedList + nltk.pos_tag(txt_list) # perform POS-Tagging
print(taggedList)
newwordList = []
flag = 0
for i in range(0,len(taggedList)-1):
if(taggedList[i][1]=='NN' and taggedList[i+1][1]=='NN'):
newwordList.append(taggedList[i][0]+taggedList[i+1][0])
flag=1
else:
if(flag == 1):
flag=0
continue
newwordList.append(taggedList[i][0])
if(i==len(taggedList)-2):
newwordList.append(taggedList[i+1][0])
finaltxt = ' '.join(word for word in newwordList)
print(finaltxt)
stop_words = set(stopwords.words('english'))
new_txt_list = nltk.word_tokenize(finaltxt)
wordsList = [w for w in new_txt_list if not w in stop_words]
taggedList = nltk.pos_tag(wordsList)
nlp = stanfordnlp.Pipeline()
doc = nlp(finaltxt)
dep_node = []
for dep_edge in doc.sentences[0].dependencies:
dep_node.append([dep_edge[2].text, dep_edge[0].index, dep_edge[1]])
for i in range(0, len(dep_node)):
if(int(dep_node[i][1]) != 0):
dep_node[i][1] = newwordList[(int(dep_node[i][1]) - 1)]
print(dep_node)
featureList = []
categories = []
totalfeatureList = []
for i in taggedList:
if(i[1]=='JJ' or i[1]=='NN' or i[1]=='JJR' or i[1]=='NNS' or i[1]=='RB'):
featureList.append(list(i))
totalfeatureList.append(list(i)) # stores all the features for every sentence
categories.append(i[0])
print(featureList)
print(categories)
fcluster = []
for i in featureList:
filist = []
for j in dep_node:
if((j[0]==i[0] or j[1]==i[0]) and (j[2] in ["nsubj", "acl:relcl", "obj", "dobj", "agent", "advmod", "amod", "neg", "prep_of", "acomp", "xcomp", "compound"])):
if(j[0]==i[0]):
filist.append(j[1])
else:
filist.append(j[0])
fcluster.append([i[0], filist])
print(fcluster)
finalcluster = []
dic = {}
for i in featureList:
dic[i[0]] = i[1]
for i in fcluster:
if(dic[i[0]]=='NN'):
finalcluster.append(i)
print(finalcluster)
您可能希望尝试
spacy
。以下模式将适用:
- 名词短语
- 后跟
或is
are
- 可选后跟
not
- 后跟形容词
output=[]
matcher=matcher(nlp.vocab,validate=True)
add(“mood”,None,[{“LOWER”:{“IN”:[“is”,“are”]},{“LOWER”:{“IN”:[“no”,“not”]},“OP”:“},{“DEP”:“advmod”,“OP”:“?”},{“DEP”:“acomp”}])
对于doc.noun_块中的nc:
d=doc[nc.root.right\u edge.i+1:nc.root.right\u edge.i+1+3]
匹配=匹配器(d)
如果匹配:
_,开始,结束=匹配项[0]
output.append((nc.text,d[start+1:end].text))
打印(输出)
[(“产品”,“非常好”)]
我可以直接在我自己的代码中实现这一点,还是应该修改代码块,使其适用于我自己的模型?这是spacy,与您的nltk略有不同,但产生了预期的结果。我相信不是,只有匹配的形态序列当我尝试这句话时:“产品非常好。”它无法识别它,要解决这个问题,我需要在“否”和“否”的部分添加“非常”吗?要从输出中删除“是”,我应该怎么做?
import spacy
from spacy.matcher import Matcher
nlp = spacy.load('en_core_web_sm')
output = []
doc = nlp('The product is very good')
matcher = Matcher(nlp.vocab)
matcher.add("mood",None,[{"LOWER":{"IN":["is","are"]}},{"LOWER":{"IN":["no","not"]},"OP":"?"},{"LOWER":"very","OP":"?"},{"POS":"ADJ"}])
for nc in doc.noun_chunks:
d = doc[nc.root.right_edge.i+1:nc.root.right_edge.i+1+3]
matches = matcher(d)
if matches:
_, start, end = matches[0]
output.append((nc.text, d[start+1:end].text))
print(output)
[('The product', 'very good')]