Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/345.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 为什么statistics.variance默认使用“无偏”样本方差?_Python_Statistics_Variance - Fatal编程技术网

Python 为什么statistics.variance默认使用“无偏”样本方差?

Python 为什么statistics.variance默认使用“无偏”样本方差?,python,statistics,variance,Python,Statistics,Variance,我最近开始使用python的统计模块 我注意到,默认情况下,方差方法返回“无偏”方差或样本方差: import statistics as st from random import randint def myVariance(data): # finds the variance of a given set of numbers xbar = st.mean(data) return sum([(x - xbar)**2 for x in data])/len(d

我最近开始使用python的统计模块

我注意到,默认情况下,方差方法返回“无偏”方差或样本方差:

import statistics as st
from random import randint

def myVariance(data):
    # finds the variance of a given set of numbers
    xbar = st.mean(data)
    return sum([(x - xbar)**2 for x in data])/len(data)

def myUnbiasedVariance(data):
    # finds the 'unbiased' variance of a given set of numbers (divides by N-1) 
    xbar = st.mean(data)
    return sum([(x - xbar)**2 for x in data])/(len(data)-1)

population = [randint(0, 1000) for i in range(0,100)]

print myVariance(population)

print myUnbiasedVariance(population)

print st.variance(population)
输出:

81295.8011
82116.9708081
82116.9708081

我觉得这很奇怪。我猜很多时候人们都在处理样本,所以他们想要样本方差,但我希望使用默认函数来计算总体方差。有人知道这是为什么吗?

我认为,几乎所有的时候,当人们从数据中估计方差时,他们都使用样本。根据无偏估计的定义,方差无偏估计的期望值等于总体方差

在您的代码中,使用random.randint0,1000,它从离散均匀分布中采样1001个可能值和方差1000*1002/12=83500参见,例如。这里的代码显示,平均而言,当使用样本作为输入时,statistics.variance比statistics.pvariance更接近总体方差:

import statistics as st
from random import randint

def myVariance(data):
    # finds the variance of a given set of numbers
    xbar = st.mean(data)
    return sum([(x - xbar)**2 for x in data])/len(data)

def myUnbiasedVariance(data):
    # finds the 'unbiased' variance of a given set of numbers (divides by N-1) 
    xbar = st.mean(data)
    return sum([(x - xbar)**2 for x in data])/(len(data)-1)

population = [randint(0, 1000) for i in range(0,100)]

print myVariance(population)

print myUnbiasedVariance(population)

print st.variance(population)
这里是示例输出:

mean variance(sample):  83626.0
mean pvariance(sample): 75263.4
pvariance(population):  83500.0

这里是另一个伟大的职位。我想知道完全相同的事情,这个问题的答案真的为我澄清了。使用np.var,您可以向其中添加一个参数ddof=1,以返回无偏估计值。请查看:


我猜是最小惊喜原则
print(np.var([1,2,3,4],ddof=1))
1.66666666667