Python 用NLTK加载平行语料库并使英语句子语法化_Python_Nlp_Nltk_Corpus_Lemmatization

Python 用NLTK加载平行语料库并使英语句子语法化

python nlp

Python 用NLTK加载平行语料库并使英语句子语法化,python,nlp,nltk,corpus,lemmatization,Python,Nlp,Nltk,Corpus,Lemmatization,我有一个语料库，格式如下： sentence in english \t sentence in french \t score sentence in english \t sentence in french \t score 每个句子都有标记（用whitepac分隔）现在我需要使用NLTK加载这些句子。我该怎么做？我可以在微粒读取器中使用什么方法在本例中，我可以加载NLTK提供的comtrans语料库： from nltk.corpus.util import LazyCorpusL

我有一个语料库，格式如下：

sentence in english \t sentence in french \t score
sentence in english \t sentence in french \t score

每个句子都有标记（用whitepac分隔）

现在我需要使用NLTK加载这些句子。我该怎么做？我可以在微粒读取器中使用什么方法

在本例中，我可以加载NLTK提供的comtrans语料库：

from nltk.corpus.util import LazyCorpusLoader
from nltk.corpus.reader import AlignedCorpusReader

comtrans = LazyCorpusLoader(
    'comtrans', AlignedCorpusReader, r'(?!\.).*\.txt',
     encoding='iso-8859-1')

fe=comtrans.aligned_sents('alignment-en-fr.txt')[0]
print fe

事实上，我也需要做同样的事情，但需要自己创建一个文件

在最后一步中，我需要对英语句子中的每个单词进行语法化。

为什么您希望使用微粒读取器而不是python的内置函数来阅读copora？这是因为我需要使用NLTK对这些平行句子执行一些NLP过程。所以，我认为使用微粒阅读器可能是最方便的事情。