Python 无法遵循NLTK语料库结构_Python_Nltk

Python 无法遵循NLTK语料库结构

python

Python 无法遵循NLTK语料库结构,python,nltk,Python,Nltk,我是从NLTK&Python开始的，但是我真的很困惑NLTK语料库结构。比如说我不明白为什么我们需要在nltk.corpus模块中添加两次单词 wordlist=[w表示nltk.corpus.words.words（'en'）中的w，如果w.islower（）] 此外，nltk.corpus.words和nltk.corpus.words.words的类型也不同。为什么会这样类型（nltk.corpus）语料库类型（nltk.corpus.words）语料库词汇类型（nltk.c

我是从

NLTK

Python

开始的，但是我真的很困惑

NLTK

语料库结构。比如说

我不明白为什么我们需要在nltk.corpus模块中添加两次单词

wordlist=[w表示nltk.corpus.words.words（'en'）中的w，如果w.islower（）]

此外，nltk.corpus.words和nltk.corpus.words.words的类型也不同。为什么会这样

类型（nltk.corpus）语料库类型（nltk.corpus.words）语料库词汇类型（nltk.corpus.words.words） nltk.corpus.words.words C:\\Documents and Settings\\Administrator\\nltk\U data\\corpora\\words'>>

第三，一个人如何知道需要在

nltk.corpus

中添加两次单词才能生成单词列表。我的意思是调用

nltk.corpus.words

和

nltk.corpus.words.words

有什么区别

有人能详细说明一下吗。现在很难继续阅读

NLTK

这本书的第三章

非常感谢

这真的很简单，

words

是包含的类实例的名称

nltk.corpus

，相关代码：

words=LazyCorpusLoader（'words'，wordlistcorpusleader，r'（？！README |\）.*）

所有这一切都表明，

words

是

LazyCorpusLoader

的一个实例

因此，您可以使用

nltk.corpus.words

作为参考

但是等等

如果查看

LazyCorpusLoader

的代码，它也会使用

WordListCorpusReader

调用

LazyCorpusLoader

WordListCorpusReader

碰巧有一个名为

words

的方法，它看起来像这样：

def words(self, fileids=None):
    return line_tokenize(self.raw(fileids))

和

LazyCorpusLoader

这样做

corpus=self.\uuu reader\u cls（root，*self.\uu args，**self.\uu kwargs）

从本质上讲，这就是将

self.\uu reader\uu cls

作为

WordListCorpusReader

的一个实例（它有自己的words方法）

然后它会这样做：

self.__dict__ = corpus.__dict__ 
self.__class__ = corpus.__class__

根据Python文档

。\uuuu dict\uuuuu是模块的名称空间，作为字典对象。因此，它将名称空间更改为corpus
的名称空间。同样，对于\uuuuu class\uuuuuu
，文档会说\uuuuu class\uuuu>是实例的类
，因此它也会更改类。因此，在这种情况下，nltk.corpus.words.words
引用名为words
的实例中包含的实例方法words。这有意义吗？此代码说明了相同的行为：
class Bar(object):
    def foo(self):
        return "I am a method of Bar"

class Foo(object):
    def __init__(self, newcls):
        newcls = newcls()
        self.__class__ = newcls.__class__
        self.__dict__ = newcls.__dict__

foo = Foo(Bar)
print foo.foo()

这里还有指向源代码的链接，您可以自己查看：