Python 信息论测度：熵计算_Python_Statistics_Entropy_Information Theory_Cross Entropy

Python 信息论测度：熵计算

python statistics

Python 信息论测度：熵计算,python,statistics,entropy,information-theory,cross-entropy,Python,Statistics,Entropy,Information Theory,Cross Entropy,我有一个由数千行组成的语料库。为了简单起见，让我们将语料库考虑为： Today is a good day I hope the day is good today It's going to rain today Today I have to study 如何使用上述语料库计算熵？熵的公式如下所示：这是我到目前为止的理解：Pi是指单个符号的概率，计算为频率（p）/（字符总数）。我不明白的是总结？我不确定在这个特定的公式中求和是如何工作的我正在使用python3.5.2进行统计数据分析

我有一个由数千行组成的语料库。为了简单起见，让我们将语料库考虑为：

Today is a good day
I hope the day is good today
It's going to rain today
Today I have to study

如何使用上述语料库计算熵？熵的公式如下所示：

这是我到目前为止的理解：Pi是指单个符号的概率，计算为

频率（p）/（字符总数）

。我不明白的是总结？我不确定在这个特定的公式中求和是如何工作的

我正在使用

python3.5.2

进行统计数据分析。如果有人能帮我计算熵的代码片段，那就太好了。

你可以使用SciPy来计算熵

或者写这样的东西：

import math
def Entropy(string,base = 2.0):
    #make set with all unrepeatable symbols from string
    dct = dict.fromkeys(list(string))

    #calculate frequencies
    pkvec =  [float(string.count(c)) / len(string) for c in dct]

    #calculate Entropy
    H = -sum([pk  * math.log(pk) / math.log(base) for pk in pkvec ])
    return H


print(Entropy("Python is not so easy"))

它返回3.27280432733。

必须有一个执行数组函数和的

numpy

方法。