用于情绪分析的Python

用于情绪分析的Python,python,nltk,sentiment-analysis,Python,Nltk,Sentiment Analysis,我有一个示例代码如下,它使用nltk语料库中的训练和测试数据,并打印出句子的情感。我想做的是用任何文本替换测试数据集 from nltk.classify import NaiveBayesClassifier from nltk.corpus import subjectivity from nltk.sentiment import SentimentAnalyzer from nltk.sentiment.util import * n_instances = 100 # Each d

我有一个示例代码如下,它使用nltk语料库中的训练和测试数据,并打印出句子的情感。我想做的是用任何文本替换测试数据集

from nltk.classify import NaiveBayesClassifier
from nltk.corpus import subjectivity
from nltk.sentiment import SentimentAnalyzer
from nltk.sentiment.util import *

n_instances = 100

# Each document is represented by a tuple (sentence, label).
# The sentence is tokenized, so it is represented by a list of strings:
subj_docs = [(sent, 'subj') for sent in subjectivity.sents(categories='subj')[:n_instances]]
obj_docs = [(sent, 'obj') for sent in subjectivity.sents(categories='obj')[:n_instances]]

# split subjective and objective instances to keep a balanced uniform class distribution
# in both train and test sets
train_subj_docs = subj_docs[:80]
test_subj_docs = subj_docs[80:100]
train_obj_docs = obj_docs[:80]
test_obj_docs = obj_docs[80:100]
training_docs = train_subj_docs+train_obj_docs
testing_docs = test_subj_docs+test_obj_docs


sentim_analyzer = SentimentAnalyzer()
all_words_neg = sentim_analyzer.all_words([mark_negation(doc) for doc in training_docs])

# simple unigram word features, handling negation
unigram_feats = sentim_analyzer.unigram_word_feats(all_words_neg, min_freq=4)
sentim_analyzer.add_feat_extractor(extract_unigram_feats, unigrams=unigram_feats)

# apply features to obtain a feature-value representation of our datasets
training_set = sentim_analyzer.apply_features(training_docs)
test_set = sentim_analyzer.apply_features(testing_docs)

# train the Naive Bayes classifier on the training set
trainer = NaiveBayesClassifier.train
classifier = sentim_analyzer.train(trainer, training_set)

# output evaluation results
for key,value in sorted(sentim_analyzer.evaluate(test_set).items()):
    print('{0}: {1}'.format(key, value))
因此,当我试图用一个存储文本的变量替换测试文档时,比如段落=Hello World,这是一个测试数据集。我收到此错误消息ValueError:值太多,无法解压缩预期的值2

有人知道如何修复此错误吗?谢谢。

这是因为测试文档不是字符串,而是元组列表。打印出示例中测试文档的值,如果要用段落替换,请确保使用相同的格式

如果你想了解你得到的错误,你应该首先阅读并理解元组解包

这个简单的示例复制了它:

>>> a = 'abc'
>>> b,c=a
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: too many values to unpack (expected 2)
尽管我非常怀疑每个元组的第一个元素是单个字符

我的猜测是,在代码中的某个地方,您会发现一个循环,它试图将测试文档的值解压为两个变量

for val, category in testing_docs:
    ...

请发布整个异常。打印测试文档的值,我相信您会发现它不是字符串。快速看一眼,我会说这是一个元组列表。谢谢,我明白你的意思了。但我不确定如何解决这个问题。我试图使用list=tokenize.sent_tokenizeparagraph,但仍然得到相同的错误。@leo2510它在您发布的代码中的注释中说明了这一点:每个文档都由一个元组语句label表示。这个句子是标记化的,所以它由一系列字符串表示:你不能只传递任何你喜欢的旧输入,你需要查看代码的输入是什么样子,然后匹配它。。。
for val, category in testing_docs:
    ...