Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/306.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何通过Python找到每列数据集的熵?_Python_Pandas_Numpy_Machine Learning_Entropy - Fatal编程技术网

如何通过Python找到每列数据集的熵?

如何通过Python找到每列数据集的熵?,python,pandas,numpy,machine-learning,entropy,Python,Pandas,Numpy,Machine Learning,Entropy,我使用Python将数据集量化为10个级别,如下所示: 9 9 1 8 9 1 1 9 3 6 1 0 8 3 8 4 4 1 0 2 1 9 9 0 这意味着组件9 1 8 9属于类别1。我想找到每个featurecolumn的熵。 我编写了以下代码,但有许多错误: import pandas as pd import math f = open ( 'data1.txt' , 'r') # Finding the probability df = pd.DataFrame(pd.

我使用Python将数据集量化为10个级别,如下所示:

9 9 1 8 9 1

1 9 3 6 1 0

8 3 8 4 4 1

0 2 1 9 9 0
这意味着组件9 1 8 9属于类别1。我想找到每个featurecolumn的熵。 我编写了以下代码,但有许多错误:

import pandas as pd
import math

f = open ( 'data1.txt' , 'r')

# Finding the probability
df = pd.DataFrame(pd.read_csv(f, sep='\t', header=None, names=['val1', 
    'val2', 'val3', 'val4','val5', 'val6', 'val7', 'val8']))
df.loc[:,"val1":"val5"] = df.loc[:,"val1":"val5"].div(df.sum(axis=0), 
    axis=1)

# Calculating Entropy
def shannon(col):
    entropy = - sum([ p * math.log(p) / math.log(2.0) for p in col])
    return entropy

sh_df = df.loc[:,'val1':'val5'].apply(shannon,axis=0)

您能更正我的代码吗?或者您知道Python中用于查找数据集每列的熵的函数吗?

您可以使用以下脚本在pandas中查找列的熵

import numpy as np
from scipy.stats import entropy
from math import log, e
import pandas as pd   

""" Usage: pandas_entropy(df['column1']) """

def pandas_entropy(column, base=None):
  vc = pd.Series(column).value_counts(normalize=True, sort=False)
  base = e if base is None else base
  return -(vc * np.log(vc)/np.log(base)).sum()
只需对每个列运行上一个函数,它就会返回每个熵

请参考这个答案,scipy已经有了熵的公式