Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/288.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何更准确地标记要素集?_Python_Nltk - Fatal编程技术网

Python 如何更准确地标记要素集?

Python 如何更准确地标记要素集?,python,nltk,Python,Nltk,我不熟悉nltk中的分类器训练,因此我尝试在电影评论语料库中训练NaiveBayesClassifier,但注意到它将负面特征集错误标记为正面,请注意: def bag_of_words(words): return dict([(word, True) for word in words]) def label_feats_from_corpus(corp, feature_detector=bag_of_words): label_feats = collections.d

我不熟悉nltk中的分类器训练,因此我尝试在电影评论语料库中训练
NaiveBayesClassifier
,但注意到它将负面特征集错误标记为正面,请注意:

 def bag_of_words(words):
   return dict([(word, True) for word in words])

 def label_feats_from_corpus(corp, feature_detector=bag_of_words):
   label_feats = collections.defaultdict(list)
   for label in corp.categories():
     for fileid in corp.fileids(categories=[label]):
     feats = feature_detector(corp.words(fileids=[fileid]))
     label_feats[label].append(feats)
   return label_feats

 def split_label_feats(lfeats, split=0.75):
   train_feats = []
   test_feats = []
   for label, feats in lfeats.iteritems():
     cutoff = int(len(feats) * split)
     train_feats.extend([(feat, label) for feat in feats[:cutoff]])
     test_feats.extend([(feat, label) for feat in feats[cutoff:]])
   return train_feats, test_feats

 >>> from nltk.corpus import movie_reviews
 >>> from featx import label_feats_from_corpus, split_label_feats
 >>> movie_reviews.categories()
 ['neg', 'pos']
 >>> lfeats = label_feats_from_corpus(movie_reviews)
 >>> lfeats.keys()
 ['neg', 'pos']
 >>> train_feats, test_feats = split_label_feats(lfeats)
 >>> len(train_feats)
 750
 >>> len(test_feats)
 250
 >>> from nltk.classify import NaiveBayesClassifier
 >>> nb_classifier = NaiveBayesClassifier.train(train_feats)
 >>> nb_classifier
 <nltk.classify.naivebayes.NaiveBayesClassifier object at 0x7f1127b50510>
 >>> nb_classifier.labels()
 ['pos']
 >>> from featx import bag_of_words
 >>> negfeat = bag_of_words(['the', 'plot', 'was', 'ludicrous'])
 >>> nb_classifier.classify(negfeat)
 'pos'
 >>> posfeat = bag_of_words(['kate', 'winslet', 'is', 'accessible'])
 >>> nb_classifier.classify(posfeat)
 'pos'

  Why does the 'neg' label not show up when I call the labels function, and it labels the positive feature set as 'pos', so how can I change my code so that it labels the negative feature set as 'neg'?
def bag_of_words(单词):
返回dict([(word,True)表示单词中的单词])
来自语料库的def标签(公司,特征检测器=单词袋):
label\u feats=collections.defaultdict(列表)
对于公司类别()中的标签:
对于corp.fileid中的fileid(categories=[label]):
feats=特征检测器(corp.words(fileid=[fileid]))
标签\专长[标签].附加(专长)
返回标签的壮举
def split_label_专长(lfeats,split=0.75):
训练专长=[]
测试专长=[]
对于标签,lfeats.iteritems()中的专长:
切断=智力(长(专长)*分割)
训练专长。扩展([(专长,标签)专长中的专长[:截止]])
测试专长。扩展([(专长,标签)专长中的专长[截止:])
返程列车专长,测试专长
>>>从nltk.corpus导入电影\u评论
>>>从featx从语料库导入标签,拆分标签
>>>电影评论。分类()
['neg','pos']
>>>lfeats=从语料库(电影评论)中为电影制作标签
>>>lfeats.keys()
['neg','pos']
>>>训练专长、测试专长=分离专长(lfeats)
>>>len(训练壮举)
750
>>>len(测试专长)
250
>>>从nltk.classify导入贝叶斯分类器
>>>nb_分类器=NaiveBayesClassifier.train(train_专长)
>>>铌铀分级机
>>>nb_分类器.标签()
['pos']
>>>从featx导入一袋单词
>>>negfeat=bag_of_单词(['the'、'plot'、'was'、'滑稽])
>>>nb_分类器分类(negfeat)
“pos”
>>>posfeat=bag_of_单词(['kate','winslet','is','accessible'])
>>>nb_分类器分类(posfeat)
“pos”
为什么调用labels函数时“neg”标签不显示,并且它将正功能集标记为“pos”,那么如何更改代码,使其将负功能集标记为“neg”?