在nltk for python中编辑Vader_lexicon.txt以添加与我的域相关的单词_Python_Python 3.x_Nlp_Nltk_Sentiment Analysis

在nltk for python中编辑Vader_lexicon.txt以添加与我的域相关的单词

python python-3.x nlp

在nltk for python中编辑Vader_lexicon.txt以添加与我的域相关的单词,python,python-3.x,nlp,nltk,sentiment-analysis,Python,Python 3.x,Nlp,Nltk,Sentiment Analysis,我在nltk中使用vader查找文件中每一行的情感。我有两个问题：我需要在vader_lexicon.txt中添加单词，但其语法如下所示：攻击-2.50.92195[-1，-3，-3，-4，-3，-1，-2，-2，-3] -2.5和0.92195[-1，-3，-3，-4，-3，-1，-2，-2，-3]代表什么我应该如何为一个新词编码？假设我必须添加一些类似于'100%，'A1' 我还可以在nltk\u data\corpora\opinion\u lexicon文件夹中看到正面和负面的tx

我在

nltk

中使用

vader

查找文件中每一行的情感。我有两个问题：

我需要在

vader_lexicon.txt

中添加单词，但其语法如下所示：

攻击-2.50.92195[-1，-3，-3，-4，-3，-1，-2，-2，-3]

-2.5

和

0.92195[-1，-3，-3，-4，-3，-1，-2，-2，-3]

代表什么

我应该如何为一个新词编码？假设我必须添加一些类似于

'100%

，

'A1'

我还可以在

nltk\u data\corpora\opinion\u lexicon

文件夹中看到正面和负面的txt单词。如何利用这些资源？我也可以在这些txt文件中添加我的文字吗

我相信维德在分类文本时只使用单词和第一个值。如果要添加新词，只需创建单词及其情感值的字典，即可使用更新功能添加：

from nltk.sentiment.vader import SentimentIntensityAnalyzer

Analyzer = SentimentIntensityAnalyser()
Analyzer.lexicon.update(your_dictionary)

您可以根据感知到的情绪强度手动分配带有情绪值的单词，或者如果这不切实际，则可以在两个类别（例如-1.5和1.5）中分配广泛的值

您可以使用此脚本（不是我的脚本）检查是否包含您的更新：

import nltk
from nltk.tokenize import word_tokenize, RegexpTokenizer
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import pandas as pd

Analyzer = SentimentIntensityAnalyzer()

sentence = 'enter your text to test'

tokenized_sentence = nltk.word_tokenize(sentence)
pos_word_list=[]
neu_word_list=[]
neg_word_list=[]

for word in tokenized_sentence:
    if (Analyzer.polarity_scores(word)['compound']) >= 0.1:
        pos_word_list.append(word)
    elif (Analyzer.polarity_scores(word)['compound']) <= -0.1:
        neg_word_list.append(word)
    else:
        neu_word_list.append(word)                

print('Positive:',pos_word_list)
print('Neutral:',neu_word_list)
print('Negative:',neg_word_list) 
score = Analyzer.polarity_scores(sentence)
print('\nScores:', score)

使用基于金融的词典更新维德后：

Analyzer.lexicon.update(Financial_Lexicon)
sentence = 'stocks were volatile on Tuesday due to the recent calamities in the Chinese market'

Positive: []
Neutral: ['stocks', 'were', 'on', 'Tuesday', 'due', 'to', 'the', 'recent', 'in', 'the', 'Chinese', 'markets']
Negative: ['volatile', 'calamities']
Scores: {'neg': 0.294, 'neu': 0.706, 'pos': 0.0, 'compound': -0.6124}

谢谢@laurie。你能告诉我，如果我输入的单词没有出现在词典文件中，应该没有分数。然而，我得到了输入的正分数，因为在词汇TXT中没有单词，这很奇怪。。你能举个例子吗？您是否使用了测试脚本来检查要挑选的单词？

Analyzer.lexicon.update(Financial_Lexicon)
sentence = 'stocks were volatile on Tuesday due to the recent calamities in the Chinese market'

Positive: []
Neutral: ['stocks', 'were', 'on', 'Tuesday', 'due', 'to', 'the', 'recent', 'in', 'the', 'Chinese', 'markets']
Negative: ['volatile', 'calamities']
Scores: {'neg': 0.294, 'neu': 0.706, 'pos': 0.0, 'compound': -0.6124}