Python NLTK培训师：无法使Scikit学习分类器工作_Python_Scikit Learn_Nltk

Python NLTK培训师：无法使Scikit学习分类器工作

python scikit-learn

Python NLTK培训师：无法使Scikit学习分类器工作,python,scikit-learn,nltk,Python,Scikit Learn,Nltk,我正在使用Python 2.7和Jacob Perkins创建的很棒的工具NLTK Trainer。我已经成功地使用了NaiveBayes分类器，但是当我尝试使用各种scikit学习分类器时，它会抛出错误消息。请帮忙。这是我的代码和相关的错误消息 C:\WINDOWS\system32>C:\Python27\python C:\Users\ned\Desktop\nltk-trainer-master \train_classifier.py --instances files --f

我正在使用Python 2.7和Jacob Perkins创建的很棒的工具NLTK Trainer。我已经成功地使用了NaiveBayes分类器，但是当我尝试使用各种scikit学习分类器时，它会抛出错误消息。请帮忙。这是我的代码和相关的错误消息

C:\WINDOWS\system32>C:\Python27\python  C:\Users\ned\Desktop\nltk-trainer-master
\train_classifier.py --instances files --fraction 0.75 --no-pickle --min_score 2
 --ngrams 1 2 3 --show-most-informative 10 movie_reviews --classifier sklearn.Mu
ltinomialNB



training sklearn.MultinomialNB classifier
C:\Python27\lib\site-packages\numpy\core\fromnumeric.py:2499: VisibleDeprecation
Warning: `rank` is deprecated; use the `ndim` attribute or function instead. To
find the rank of a matrix see `numpy.linalg.matrix_rank`.
  VisibleDeprecationWarning)
Traceback (most recent call last):
  File "C:\Users\ned\Desktop\nltk-trainer-master\train_classifier.py", line 385,
 in <module>
    print('accuracy: %f' % accuracy(classifier, test_feats))
  File "C:\Python27\lib\site-packages\nltk\classify\util.py", line 87, in accura
cy
    results = classifier.classify_many([fs for (fs, l) in gold])
  File "C:\Python27\lib\site-packages\nltk\classify\scikitlearn.py", line 83, in
 classify_many
    X = self._vectorizer.transform(featuresets)
  File "C:\Users\ned\Desktop\nltk-trainer-master\sklearn\feature_extraction\dict
_vectorizer.py", line 286, in transform
    return self._transform(X, fitting=False)
  File "C:\Users\ned\Desktop\nltk-trainer-master\sklearn\feature_extraction\dict
_vectorizer.py", line 196, in _transform
    result_matrix.sort_indices()
  File "C:\Python27\lib\site-packages\scipy\sparse\compressed.py", line 619, in
sort_indices
    fn( len(self.indptr) - 1, self.indptr, self.indices, self.data)
  File "C:\Python27\lib\site-packages\scipy\sparse\sparsetools\csr.py", line 546
, in csr_sort_indices
    return _csr.csr_sort_indices(*args)
TypeError: Array of type 'byte' required.  Array of type 'bool' given

C:\WINDOWS\system32>C:\Python27\python C:\Users\ned\Desktop\nltk培训师大师
\train_classifier.py--实例文件--分数0.75--无pickle--最小分数2
--ngrams 1 2 3——展示信息量最大的10篇电影评论——分类器sklearn.Mu
多项式nb
训练sklearn.多项式nb分类器
C:\Python27\lib\site packages\numpy\core\fromnumeric.py:2499:VisibleDeprecation
警告：`rank`已弃用；请改用“ndim”属性或函数。到
求矩阵的秩参见'numpy.linalg.matrix_rank'。
VisibleDepractionWarning）
回溯（最近一次呼叫最后一次）：
文件“C:\Users\ned\Desktop\nltk trainer master\train\u classifier.py”，第385行，
在里面
打印（'精度：%f''精度（分类器、测试专长））
文件“C:\Python27\lib\site packages\nltk\classify\util.py”，第87行，在accura中
赛义德
结果=分类器。分类\u多个（[fs表示（fs，l）为金色]）
文件“C:\Python27\lib\site packages\nltk\classify\scikitlearn.py”，第83行，在
分门别类
X=自向量器变换（特征集）
文件“C:\Users\ned\Desktop\nltk trainer master\sklearn\feature\u extraction\dict
_矢量器.py“，第286行，变换中
返回self.\u变换（X，fitting=False）
文件“C:\Users\ned\Desktop\nltk trainer master\sklearn\feature\u extraction\dict
_矢量器.py”，第196行，在_变换中
结果矩阵。排序索引（）
文件“C:\Python27\lib\site packages\scipy\sparse\compressed.py”，第619行，在
排序索引
fn（len（self.indptr）-1，self.indptr，self.index，self.data）
文件“C:\Python27\lib\site packages\scipy\sparse\sparsetools\csr.py”，第546行
，在csr_排序_索引中
返回_csr.csr\u排序索引（*args）
TypeError:需要“byte”类型的数组。给定“bool”类型的数组

然后，我使用以下版本： Python 2.7.10

Python 2.7 numpy 1.9.1

Python 2.7 scikit学习0.16.1

Python 2.7 scipy 0.10.1

Python 2.7 NLTK 3.0.4

Argparse 1.3.0

***谢谢大家的帮助。这个问题确实是一个过时的图书馆。我从这里安装了最新版本：并使用此处的简单安装指南：

我注意到该项目的github存储库中存在一个问题，并显示了确切的错误消息：

用户表示：

明白了，我用diff在另一台机器上训练了分类器 scipy和/或sklearn的版本

在上面的例子中，你似乎是在同一台机器上训练的，是这样吗

可能相关？如果没有，它可能会让你对这个问题有更多的澄清

在另一张票中，如果是版本问题，我会对所有东西进行版本检查。我认为Python3现在比2.7得到了更积极的开发/支持

您使用的是scipy 0.10.1，这是以前的几个版本——请尝试升级到scipy 0.14

这里有一个例子，它的工作和版本的软件包使用

$ python
Python 2.7.10 (default, Jul  5 2015, 14:15:43) 
[GCC 5.1.1 20150618 (Red Hat 5.1.1-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import scipy
>>> scipy.__version__
'0.14.1'
>>> import numpy
>>> numpy.__version__
'1.9.2'
>>> import sklearn
>>> sklearn.__version__
'0.16.1'
>>> import nltk
>>> nltk.__version__
'3.0.4'
>>> import argparse
>>> argparse.__version__
'1.1'

$ python train_classifier.py --instances files --fraction 0.75 --no-pickle --min_score 2 --ngrams 1 2 3 --show-most-informative 10 movie_reviews --classifier sklearn.MultinomialNB
loading movie_reviews
2 labels: [u'neg', u'pos']
calculating word scores
using bag of words from known set feature extraction
71903 words meet min_score and/or max_feats
1500 training feats, 500 testing feats
training sklearn.MultinomialNB with {'alpha': 1.0}
using dtype bool
training sklearn.MultinomialNB classifier
accuracy: 0.788000
neg precision: 0.918605
neg recall: 0.632000
neg f-measure: 0.748815
pos precision: 0.719512
pos recall: 0.944000
pos f-measure: 0.816609

是的，我用的是同一台机器