在python中构造单图、双图和三图_Python_Nltk

在python中构造单图、双图和三图

python

在python中构造单图、双图和三图,python,nltk,Python,Nltk,如何构造大型语料库的单图、双图和三图，然后计算它们各自的频率。按最频繁到最不频繁的顺序排列结果 from nltk import word_tokenize from nltk.util import ngrams from collections import Counter text = "I need to write a program in NLTK that breaks a corpus (a large collection of \ txt files) into unigr

如何构造大型语料库的单图、双图和三图，然后计算它们各自的频率。按最频繁到最不频繁的顺序排列结果

from nltk import word_tokenize
from nltk.util import ngrams
from collections import Counter

text = "I need to write a program in NLTK that breaks a corpus (a large collection of \
txt files) into unigrams, bigrams, trigrams, fourgrams and fivegrams.\ 
I need to write a program in NLTK that breaks a corpus"
token = nltk.word_tokenize(text)
bigrams = ngrams(token,2)
trigrams = ngrams(token,3)```

试试这个：

import nltk
from nltk import word_tokenize
from nltk.util import ngrams
from collections import Counter

text = '''I need to write a program in NLTK that breaks a corpus (a large 
collection of txt files) into unigrams, bigrams, trigrams, fourgrams and 
fivegrams. I need to write a program in NLTK that breaks a corpus'''

token = nltk.word_tokenize(text)
most_frequent_bigrams = Counter(list(ngrams(token,2))).most_common()
most_frequent_trigrams = Counter(list(ngrams(token,3))).most_common()
for k, v in most_frequent_bigrams:
    print (k,v)
for k, v in most_frequent_trigrams:
    print (k,v)

你试过什么？我只是试着建造它们，但没有成功。。。。。我是pythoncan的初学者，请给我们看一下你尝试过的代码。我更新了帖子，强烈建议你查看这个gensim库，而不是nltk