Installation 以编程方式安装NLTK语料库/模型，即不使用GUI下载程序？_Installation_Packages_Nltk_Requirements_Corpus

Installation 以编程方式安装NLTK语料库/模型，即不使用GUI下载程序？

installation

Installation 以编程方式安装NLTK语料库/模型，即不使用GUI下载程序？,installation,packages,nltk,requirements,corpus,Installation,Packages,Nltk,Requirements,Corpus,我的项目使用NLTK。如何列出项目的语料库和模型需求，以便自动安装？我不想点击nltk.download（）GUI，逐个安装软件包此外，是否有任何方法可以冻结相同的需求列表（如pip freeze）？NLTK网站在本页底部列出了用于下载软件包和集合的命令行界面：命令行的使用因您使用的Python版本而异，但在我的Python2.6安装中，我注意到我缺少了“西班牙语语法”模型，这很好： python -m nltk.downloader spanish_grammars 您提到列出了项目的

我的项目使用NLTK。如何列出项目的语料库和模型需求，以便自动安装？我不想点击

nltk.download（）

GUI，逐个安装软件包

此外，是否有任何方法可以冻结相同的需求列表（如

pip freeze

）？

NLTK网站在本页底部列出了用于下载软件包和集合的命令行界面：

命令行的使用因您使用的Python版本而异，但在我的Python2.6安装中，我注意到我缺少了“西班牙语语法”模型，这很好：

python -m nltk.downloader spanish_grammars

您提到列出了项目的语料库和模型需求，虽然我不确定自动完成这一任务的方法，但我想我至少应该分享一下。

除了前面提到的命令行选项外，您还可以通过在

下载（）中添加参数，以编程方式在Python脚本中安装NLTK数据

功能

请参阅

帮助（nltk.download）

文本，具体如下：

我可以确认，一次下载一个软件包时，或者当传递

列表

或

元组

时，此功能有效

>>> import nltk
>>> nltk.download('wordnet')
[nltk_data] Downloading package 'wordnet' to
[nltk_data]     C:\Users\_my-username_\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping corpora\wordnet.zip.
True

您也可以尝试下载已下载的软件包，而不会出现问题：

>>> nltk.download('wordnet')
[nltk_data] Downloading package 'wordnet' to
[nltk_data]     C:\Users\_my-username_\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
True

此外，函数似乎返回一个布尔值，您可以使用该值查看下载是否成功：

>>> nltk.download('not-a-real-name')
[nltk_data] Error loading not-a-real-name: Package 'not-a-real-name'
[nltk_data]     not found in index
False

要安装所有NLTK语料库和模型：

python -m nltk.downloader all

或者，在Linux上，您可以使用：

sudo python -m nltk.downloader -d /usr/local/share/nltk_data all

如果您只想列出最流行的语料库和模型，请将

all

替换为

popular

您还可以通过命令行浏览语料库和模型：

mlee@server:/scratch/jjylee/tests$ sudo python -m nltk.downloader
[sudo] password for jjylee:
NLTK Downloader
---------------------------------------------------------------------------
    d) Download   l) List    u) Update   c) Config   h) Help   q) Quit
---------------------------------------------------------------------------
Downloader> d

Download which package (l=list; x=cancel)?
  Identifier> l
Packages:
  [ ] averaged_perceptron_tagger_ru Averaged Perceptron Tagger (Russian)
  [ ] basque_grammars..... Grammars for Basque
  [ ] bllip_wsj_no_aux.... BLLIP Parser: WSJ Model
  [ ] book_grammars....... Grammars from NLTK Book
  [ ] cess_esp............ CESS-ESP Treebank
  [ ] chat80.............. Chat-80 Data Files
  [ ] city_database....... City Database
  [ ] cmudict............. The Carnegie Mellon Pronouncing Dictionary (0.6)
  [ ] comparative_sentences Comparative Sentence Dataset
  [ ] comtrans............ ComTrans Corpus Sample
  [ ] conll2000........... CONLL 2000 Chunking Corpus
  [ ] conll2002........... CONLL 2002 Named Entity Recognition Corpus
  [ ] conll2007........... Dependency Treebanks from CoNLL 2007 (Catalan
                           and Basque Subset)
  [ ] crubadan............ Crubadan Corpus
  [ ] dependency_treebank. Dependency Parsed Treebank
  [ ] europarl_raw........ Sample European Parliament Proceedings Parallel
                           Corpus
  [ ] floresta............ Portuguese Treebank
  [ ] framenet_v15........ FrameNet 1.5
Hit Enter to continue: 
  [ ] framenet_v17........ FrameNet 1.7
  [ ] gazetteers.......... Gazeteer Lists
  [ ] genesis............. Genesis Corpus
  [ ] gutenberg........... Project Gutenberg Selections
  [ ] hmm_treebank_pos_tagger Treebank Part of Speech Tagger (HMM)
  [ ] ieer................ NIST IE-ER DATA SAMPLE
  [ ] inaugural........... C-Span Inaugural Address Corpus
  [ ] indian.............. Indian Language POS-Tagged Corpus
  [ ] jeita............... JEITA Public Morphologically Tagged Corpus (in
                           ChaSen format)
  [ ] kimmo............... PC-KIMMO Data Files
  [ ] knbc................ KNB Corpus (Annotated blog corpus)
  [ ] large_grammars...... Large context-free and feature-based grammars
                           for parser comparison
  [ ] lin_thesaurus....... Lin's Dependency Thesaurus
  [ ] mac_morpho.......... MAC-MORPHO: Brazilian Portuguese news text with
                           part-of-speech tags
  [ ] machado............. Machado de Assis -- Obra Completa
  [ ] masc_tagged......... MASC Tagged Corpus
  [ ] maxent_ne_chunker... ACE Named Entity Chunker (Maximum entropy)
  [ ] moses_sample........ Moses Sample Models
Hit Enter to continue: x


Download which package (l=list; x=cancel)?
  Identifier> conll2002
    Downloading package conll2002 to
        /afs/mit.edu/u/m/mlee/nltk_data...
      Unzipping corpora/conll2002.zip.

---------------------------------------------------------------------------
    d) Download   l) List    u) Update   c) Config   h) Help   q) Quit
---------------------------------------------------------------------------
Downloader>

我已使用以下代码在自定义目录中安装了语料库和模型：

import nltk
nltk.download(info_or_id="popular", download_dir="/path/to/dir")
nltk.data.path.append("/path/to/dir")

这将在

/path/to/dir

中安装“all”语料库/模型，并告知NLTK在何处查找它（

data.path.append

）

您不能«冻结»需求文件中的数据，但您可以将此代码添加到您的

\uuuu init\uuuuuu

中，此外，还可以来检查文件是否已经存在

import nltk
nltk.download(info_or_id="popular", download_dir="/path/to/dir")
nltk.data.path.append("/path/to/dir")