Python 如何更准确地标记要素集？_Python_Nltk

Python 如何更准确地标记要素集？

python

Python 如何更准确地标记要素集？,python,nltk,Python,Nltk,我不熟悉nltk中的分类器训练，因此我尝试在电影评论语料库中训练NaiveBayesClassifier，但注意到它将负面特征集错误标记为正面，请注意： def bag_of_words(words): return dict([(word, True) for word in words]) def label_feats_from_corpus(corp, feature_detector=bag_of_words): label_feats = collections.d

我不熟悉nltk中的分类器训练，因此我尝试在电影评论语料库中训练

NaiveBayesClassifier

，但注意到它将负面特征集错误标记为正面，请注意：

 def bag_of_words(words):
   return dict([(word, True) for word in words])

 def label_feats_from_corpus(corp, feature_detector=bag_of_words):
   label_feats = collections.defaultdict(list)
   for label in corp.categories():
     for fileid in corp.fileids(categories=[label]):
     feats = feature_detector(corp.words(fileids=[fileid]))
     label_feats[label].append(feats)
   return label_feats

 def split_label_feats(lfeats, split=0.75):
   train_feats = []
   test_feats = []
   for label, feats in lfeats.iteritems():
     cutoff = int(len(feats) * split)
     train_feats.extend([(feat, label) for feat in feats[:cutoff]])
     test_feats.extend([(feat, label) for feat in feats[cutoff:]])
   return train_feats, test_feats

 >>> from nltk.corpus import movie_reviews
 >>> from featx import label_feats_from_corpus, split_label_feats
 >>> movie_reviews.categories()
 ['neg', 'pos']
 >>> lfeats = label_feats_from_corpus(movie_reviews)
 >>> lfeats.keys()
 ['neg', 'pos']
 >>> train_feats, test_feats = split_label_feats(lfeats)
 >>> len(train_feats)
 750
 >>> len(test_feats)
 250
 >>> from nltk.classify import NaiveBayesClassifier
 >>> nb_classifier = NaiveBayesClassifier.train(train_feats)
 >>> nb_classifier
 <nltk.classify.naivebayes.NaiveBayesClassifier object at 0x7f1127b50510>
 >>> nb_classifier.labels()
 ['pos']
 >>> from featx import bag_of_words
 >>> negfeat = bag_of_words(['the', 'plot', 'was', 'ludicrous'])
 >>> nb_classifier.classify(negfeat)
 'pos'
 >>> posfeat = bag_of_words(['kate', 'winslet', 'is', 'accessible'])
 >>> nb_classifier.classify(posfeat)
 'pos'

  Why does the 'neg' label not show up when I call the labels function, and it labels the positive feature set as 'pos', so how can I change my code so that it labels the negative feature set as 'neg'?

def bag_of_words（单词）：
返回dict（[（word，True）表示单词中的单词]）
来自语料库的def标签（公司，特征检测器=单词袋）：
label\u feats=collections.defaultdict（列表）
对于公司类别（）中的标签：
对于corp.fileid中的fileid（categories=[label]）：
feats=特征检测器（corp.words（fileid=[fileid]））
标签\专长[标签].附加（专长）
返回标签的壮举
def split_label_专长（lfeats，split=0.75）：
训练专长=[]
测试专长=[]
对于标签，lfeats.iteritems（）中的专长：
切断=智力（长（专长）*分割）
训练专长。扩展（[（专长，标签）专长中的专长[：截止]]）
测试专长。扩展（[（专长，标签）专长中的专长[截止：]）
返程列车专长，测试专长
>>>从nltk.corpus导入电影\u评论
>>>从featx从语料库导入标签，拆分标签
>>>电影评论。分类（）
['neg'，'pos']
>>>lfeats=从语料库（电影评论）中为电影制作标签
>>>lfeats.keys（）
['neg'，'pos']
>>>训练专长、测试专长=分离专长（lfeats）
>>>len（训练壮举）
750
>>>len（测试专长）
250
>>>从nltk.classify导入贝叶斯分类器
>>>nb_分类器=NaiveBayesClassifier.train（train_专长）
>>>铌铀分级机
>>>nb_分类器.标签（）
['pos']
>>>从featx导入一袋单词
>>>negfeat=bag_of_单词（['the'、'plot'、'was'、'滑稽]）
>>>nb_分类器分类（negfeat）
“pos”
>>>posfeat=bag_of_单词（['kate'，'winslet'，'is'，'accessible']）
>>>nb_分类器分类（posfeat）
“pos”
为什么调用labels函数时“neg”标签不显示，并且它将正功能集标记为“pos”，那么如何更改代码，使其将负功能集标记为“neg”？