Python 发生‘；开始’；给定‘；’；？_Python_Nltk_Corpus_Tagged Corpus

Python 发生‘；开始’；给定‘；’；？

python

Python 发生‘；开始’；给定‘；’；？,python,nltk,corpus,tagged-corpus,Python,Nltk,Corpus,Tagged Corpus,注意：您作为答案给出的概率必须是可从该语料库计算的概率嗨，能帮我点忙吗？这在nltk的书中。当我得到它时，我得到了78%，这是没有意义的。我试着用Python来计算。在开始的概率与相交的概率之间存在某种差异。 Using an NLTK Conditional Frequency Distribution and the nltk.bigrams function, train a bigram model on the Genesis: text = nltk.corpus.genesis

注意：您作为答案给出的概率必须是可从该语料库计算的概率

嗨，能帮我点忙吗？这在nltk的书中。当我得到它时，我得到了78%，这是没有意义的。我试着用Python来计算。

在

开始的概率与相交的概率之间存在某种差异。

Using an NLTK Conditional Frequency Distribution and the nltk.bigrams function, train a bigram model on the Genesis:

text = nltk.corpus.genesis.words('english-kjv.txt')
bigrams = nltk.bigrams(text)
cfd = nltk.ConditionalFreqDist(bigrams)
Answer the following questions

What is the Probability of ‘begining’ given ‘the’?
What is the probability of ‘the’?

以及给定“开始”的概率：

p('beginning','the')

尝试：

[out]：

from collections import Counter

import nltk

text = nltk.corpus.genesis.words('english-kjv.txt')
bigrams = nltk.bigrams(text)
cfd_bigrams = Counter(bigrams)
cfd_unigrams = Counter(list(text))

print "p('said','unto') =", cfd_bigrams[u'said', u'unto'] / float(sum(cfd_bigrams.values()))

print "p('said'|'unto') =", (cfd_bigrams[u'said', u'unto'] / float(sum(cfd_bigrams.values()))) / cfd_unigrams[u'unto']

print "p('beginning','the') =", cfd_bigrams[u'beginning', u'the']

零，这不是“开始”的拼写：）我的天啊，天才。。那么，这场比赛呢？我还是78岁

from collections import Counter

import nltk

text = nltk.corpus.genesis.words('english-kjv.txt')
bigrams = nltk.bigrams(text)
cfd_bigrams = Counter(bigrams)
cfd_unigrams = Counter(list(text))

print "p('said','unto') =", cfd_bigrams[u'said', u'unto'] / float(sum(cfd_bigrams.values()))

print "p('said'|'unto') =", (cfd_bigrams[u'said', u'unto'] / float(sum(cfd_bigrams.values()))) / cfd_unigrams[u'unto']

print "p('beginning','the') =", cfd_bigrams[u'beginning', u'the']

p('said','unto') = 0.00397649844738
p('said'|'unto') = 6.73982787691e-06
p('beginning','the') = 0