Python NLTK朴素贝叶斯分类错误

Python NLTK朴素贝叶斯分类错误,python,nltk,naivebayes,Python,Nltk,Naivebayes,错误消息: 回溯(最近一次呼叫最后一次): 文件“/Users/ABHINAV/Documents/test2.py”,第58行,在 分类器=NaiveBayesClassifier.train(trainfeats) 文件“/Library/Python/2.7/site packages/nltk/classify/naivebayes.py”,第194行,列车中 对于featureset,在标记的\u featureset中添加标签: ValueError:要解压缩的值太多 [在17.0秒

错误消息:

回溯(最近一次呼叫最后一次): 文件“/Users/ABHINAV/Documents/test2.py”,第58行,在 分类器=NaiveBayesClassifier.train(trainfeats) 文件“/Library/Python/2.7/site packages/nltk/classify/naivebayes.py”,第194行,列车中 对于featureset,在标记的\u featureset中添加标签: ValueError:要解压缩的值太多 [在17.0秒内完成,退出代码为1]

当我试图在一组数据上实现朴素贝叶斯时,我遇到了这个错误。以下是代码:

import nltk.classify.util
from nltk.classify import NaiveBayesClassifier
from nltk.corpus import movie_reviews

def word_feats(words):
    return dict([(word, True) for word in words])

negids = movie_reviews.fileids('neg')
posids = movie_reviews.fileids('pos')

negfeats = [(word_feats(movie_reviews.words(fileids=[f])), 'neg') for f in negids]
posfeats = [(word_feats(movie_reviews.words(fileids=[f])), 'pos') for f in posids]

negcutoff = len(negfeats)*3/4
poscutoff = len(posfeats)*3/4


trainfeats=[('good'),('pos'),
('quick'),('pos'),
('easy'),('pos'),
('big'),('pos'),
('iterested'),('pos'),
('important'),('pos'),
('new'),('pos'),
('patient'),('pos'),
('few'),('neg'),
('bad'),('neg'),

]

test=[
('general'),('pos'),
('many'),('pos'),
('efficient'),('pos'),
('great'),('pos'),
('interested'),('pos'),
('top'),('pos'),
('easy'),('pos'),
('big'),('pos'),
('new'),('pos'),
('wonderful'),('pos'),
('important'),('pos'),
('best'),('pos'),
('more'),('pos'),
('patient'),('pos'),
('last'),('pos'),
('worse'),('neg'),
('terrible'),('neg'),
('awful'),('neg'),
('bad'),('neg'),
('minimal'),('neg'),
('incomprehensible'),('neg'),
]

classifier = NaiveBayesClassifier.train(trainfeats)
print 'accuracy:', nltk.classify.util.accuracy(classifier, test)
classifier.show_most_informative_features()
TLDR

您需要具备以下功能:

trainfeats=[('good','pos'),
('quick','pos'),
...
与此相反:

trainfeats=[('good'),('pos'),
('quick'),('pos'),
...

解释

关键错误是
ValueError:在
NaiveBayesClassifier.train
中有太多的值无法解包,您在这一行中调用它:

classifier = NaiveBayesClassifier.train(trainfeats)
“太多的值需要解包”意味着程序在一个iterable中需要一定数量的值,并且它接收的值超过了这个数量。例如,从您的错误消息中,错误被抛出到此行:

for featureset, label in labeled_featuresets: 
这个for循环期望成对的东西在“labeled_featureset”中,它将把成对的一个成员分配给
featureset
,一个成员分配给
label
。如果
labeled_featureset
实际上有三元组,例如[1,2,3)、(1,2,3)…],那么程序不知道如何处理第三个元素,因此会抛出错误

下面是您传递给该函数的内容,我假设该函数将以
标记的\u featureset
结束:

trainfeats=[('good'),('pos'),
('quick'),('pos'),
('easy'),('pos'),
...
看起来您正试图通过成对缩进列表中的项来创建元组列表(这将防止出现错误),但这还不够。Python不会使用缩进来推断元组,只使用括号。我想这就是你想要的:

trainfeats=[('good','pos'),
('quick','pos'),
('easy','pos'),
...

用括号包围每一对,创建元组列表而不是单个元素列表。
trainfeat
变量应为:

 trainfeats=[({'good':True,'quick':True,'easy':True,
'big':True,'interested':True,'important':True,
'new':True,'patient':True},'pos'),({'few':True,'bad':True},'neg')]
test=[({'general':True,'many':True,'efficient':True,'great':True,'interested':True,'top':True,'easy':True,'big':True,'new':True,'wonderful':True,'important':True,'best':True,'more':True,'patient':True,'last':True},'pos'),({'worse':True,'terrible':True,'awful':True,'bad':True,'minimal':True,'incomprehensible':True},'neg')]
这是nltk中标记的功能集的正确格式

同样,测试变量应为:

 trainfeats=[({'good':True,'quick':True,'easy':True,
'big':True,'interested':True,'important':True,
'new':True,'patient':True},'pos'),({'few':True,'bad':True},'neg')]
test=[({'general':True,'many':True,'efficient':True,'great':True,'interested':True,'top':True,'easy':True,'big':True,'new':True,'wonderful':True,'important':True,'best':True,'more':True,'patient':True,'last':True},'pos'),({'worse':True,'terrible':True,'awful':True,'bad':True,'minimal':True,'incomprehensible':True},'neg')]

我试着像你提到的那样定义元组,但是仍然有一些错误。下面是我得到的:回溯(最后一次调用):文件“Documents/test2.py”,第28行,在classifier=NaiveBayesClassifier.train(trainfeats)文件“/Library/Python/2.7/site packages/nltk/classify/naivebayes.py”,第196行,在fname的train中,在featureset.items()中的fval:AttributeError:“str”对象没有属性“items”[在2.9秒内完成,退出代码为1]