获得;ValueError:concat()至少需要一个对象&引用;在一个简单的python代码块中

获得;ValueError:concat()至少需要一个对象&引用;在一个简单的python代码块中,python,nltk,ipython-notebook,Python,Nltk,Ipython Notebook,我在IPython笔记本中运行了以下代码块,得到了一个valueerror。我不知道这是否是一个语法错误 import sys sys.version 会给我 '2.7.9 |Anaconda 2.2.0 (64-bit)| (default, Dec 18 2014, 16:57:52) [MSC v.1500 64 bit (AMD64)]' 跑步时 from nltk.corpus import brown [(genre, word) for genre in brown.categ

我在IPython笔记本中运行了以下代码块,得到了一个valueerror。我不知道这是否是一个语法错误

import sys
sys.version
会给我

'2.7.9 |Anaconda 2.2.0 (64-bit)| (default, Dec 18 2014, 16:57:52) [MSC v.1500 64 bit (AMD64)]'
跑步时

from nltk.corpus import brown
[(genre, word) for genre in brown.categories() for word in brown.words(categories=genre) ]
我得到了以下错误:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-94-884c4187e29a> in <module>()
  1 from nltk.corpus import brown
----> 2 [(genre, word) for genre in brown.categories() for word in brown.words(categories=genre) ]

C:\Anaconda\lib\site-packages\nltk\corpus\reader\tagged.pyc in words(self, fileids, categories)
198     def words(self, fileids=None, categories=None):
199         return TaggedCorpusReader.words(
--> 200             self, self._resolve(fileids, categories))
201     def sents(self, fileids=None, categories=None):
202         return TaggedCorpusReader.sents(

C:\Anaconda\lib\site-packages\nltk\corpus\reader\tagged.pyc in words(self, fileids)
 81                                         self._para_block_reader,
 82                                         None)
---> 83                        for (fileid, enc) in self.abspaths(fileids, True)])
 84 
 85     def sents(self, fileids=None):

C:\Anaconda\lib\site-packages\nltk\corpus\reader\util.pyc in concat(docs)
412         return docs[0]
413     if len(docs) == 0:
--> 414         raise ValueError('concat() expects at least one object!')
415 
416     types = set(d.__class__ for d in docs)

ValueError: concat() expects at least one object!"
---------------------------------------------------------------------------
ValueError回溯(最近一次调用上次)
在()
1来自nltk.corpus导入布朗
---->2[(体裁,单词)表示棕色的体裁。类别()表示棕色的单词。单词(类别=体裁)]
C:\Anaconda\lib\site packages\nltk\corpus\reader\taged.pyc(self、fileid、categories)
198个定义字(self,fileid=None,categories=None):
199返回标记gedCorpusReader.words(
-->200自我,自我解析(文件ID,类别))
201 def sents(self,fileid=None,categories=None):
202返回标记gedcorpusreader.sents(
C:\Anaconda\lib\site packages\nltk\corpus\reader\taged.pyc(self,fileid)
81自动读卡器,
82(无)
--->83对于self.abspaths(fileid,True)]中的(fileid,enc)
84
85 def sents(self,fileid=None):
C:\Anaconda\lib\site packages\nltk\corpus\reader\util.pyc在concat(文档)中
412退货单[0]
413如果len(docs)=0:
-->414 raise VALUERROR('concat()至少需要一个对象!')
415
416类型=设置(d.\uuuuu类\uuuuuu用于文档中的d)
ValueError:concat()至少需要一个对象!"

提前感谢您提供的所有帮助。

从跟踪中可以看到,没有按名称分类的类别
类型

如果我只是在棕色语料库中显示类别:

for name in brown.categories():
    print name

Outputs:
adventure
belles_lettres
editorial
fiction
government
hobbies
humor
learned
lore
mystery
news
religion
reviews
romance
science_fiction
您可以使用brown语料库中的任何上述类别

将此更改为:

[(genre, word) for genre in brown.categories() for word in brown.words(categories=genre) ]
这:

[(genre, word) for genre in brown.categories() for word in brown.words(categories=['news']) ] //Interested in news categories

更多关于您在外部
for loop
中指定的类别,内部
for loop
迭代语料库中的所有类别,因此输出将是相同的。

您的代码是正确的,适合我。您的棕色语料库数据(文本文件或类别文件)似乎已损坏或丢失。

I我也有同样的问题。如果我使用上面给出的解决方案:

cfd = nltk.ConditionalFreqDist(
           (genre, word)
           for genre in brown.categories()
           for word in brown.words(categories='news')) 
genres=['news','religion','hobbies','science_fiction','romance','humor']
modals=['can','could','may','might','must','will']

cfd.tabulate(conditions=genres, samples=modals)
输出:

                 can could   may might  must  will 
           news    93    86    66    38    50   389 
       religion    93    86    66    38    50   389 
        hobbies    93    86    66    38    50   389 
science_fiction    93    86    66    38    50   389 
        romance    93    86    66    38    50   389 
          humor    93    86    66    38    50   389 
                  can could   may might  must  will 
           news    93    86    66    38    50   389 
       religion    82    59    78    12    54    71 
        hobbies   268    58   131    22    83   264 
science_fiction    16    49     4    12     8    16 
        romance    74   193    11    51    45    43 
          humor    16    30     8     8     9    13 
正如您所看到的,所有行都是相同的。外部for循环迭代所有类别,而内部for循环则从genre='news'收集单词

这个解决方案是不正确的

解决方法是首先声明类别列表“流派”:

genres=['news','religion','hobbies','science_fiction','romance','humor']
       cfd = nltk.ConditionalFreqDist(
       (genre, word)
       for genre in genres
       for word in brown.words(categories=genre))

cfd.tabulate(conditions=genres, samples=modals)
输出:

                 can could   may might  must  will 
           news    93    86    66    38    50   389 
       religion    93    86    66    38    50   389 
        hobbies    93    86    66    38    50   389 
science_fiction    93    86    66    38    50   389 
        romance    93    86    66    38    50   389 
          humor    93    86    66    38    50   389 
                  can could   may might  must  will 
           news    93    86    66    38    50   389 
       religion    82    59    78    12    54    71 
        hobbies   268    58   131    22    83   264 
science_fiction    16    49     4    12     8    16 
        romance    74   193    11    51    45    43 
          humor    16    30     8     8     9    13