Python 二元和三元概率_Python_Nltk

Python 二元和三元概率

python

Python 二元和三元概率,python,nltk,Python,Nltk,我真的需要帮助来理解概率估计的过程。因此，我计算了语料库中的Bigram数： import nltk bigram_p = {} for sentence in corpus: tokens = sentence.split() tokens = [START_SYMBOL] + tokens #Add a start symbol #so the first word would count as bigram bigrams = (tuple(nltk.

我真的需要帮助来理解概率估计的过程。因此，我计算了语料库中的Bigram数：

import nltk
bigram_p = {}

for sentence in corpus:
    tokens = sentence.split()
    tokens = [START_SYMBOL] + tokens #Add a start symbol 
    #so the first word would count as bigram
    bigrams = (tuple(nltk.bigrams(tokens)))
    for bigram in bigrams:
        if bigram not in bigram_p:
           bigram_p[bigram] = 1
        else:
           bigram_p[bigram] += 1

        for bigram in bigram_p:
            if bigram[0] == '*':  
                bigram_p[bigram] = math.log(bigram_p[bigram]/unigram_p[('STOP',)],2)
            else:
                bigram_p[bigram] = math.log(bigram_p[bigram]/unigram_p[(word[0],)],2)

但我得到一个关键错误-数学域错误-我不明白为什么。请向我解释我的错误以及如何处理它

我假设您在一些

math.log

行中遇到了这个错误。该错误仅表示您正在传递一个未定义

log

操作的参数，例如

import math

# Input is zero
math.log(0)  # ValueError: math domain error

# Input is negative
math.log(-1)  # ValueError: math domain error

您的一个表达式

bigram\u p[bigram]/unigram\u p[（'STOP'，）]

或

math.log（bigram\u p[bigram]/unigram\u p[（word[0]，）]

正在生成零或负输入

请注意，python 2.7中的除法运算符（

）是整数除法，因此如果两个参数都是整数，则结果将被截断为整数：

1 / 2    # => 0, because 1 and 2 are integers
1. / 2   # => 0.5, because 1. is a float
1.0 / 2  # => 0.5, because 1.0 is a float

如果您想要更直观的除法运算，请添加到您的文件中

from __future__ import division

如果您想了解更多信息，请参阅以下介绍

编辑：

如果你不能/不想使用导入技巧，你可以通过乘以一个float

n*1.0

或使用一个内置函数

float（n）

将数字转换为float，非常感谢！这很有帮助！@Repzz如果这个答案帮助你解决问题，请回答。