Warning: file_get_contents(/data/phpspider/zhask/data//catemap/6/cplusplus/131.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Statistics pythonscipychisquare测试从Excel和LibreOffice返回不同的p值_Statistics_Scipy_Poisson - Fatal编程技术网

Statistics pythonscipychisquare测试从Excel和LibreOffice返回不同的p值

Statistics pythonscipychisquare测试从Excel和LibreOffice返回不同的p值,statistics,scipy,poisson,Statistics,Scipy,Poisson,在阅读了一篇关于泊松分布的应用程序的文章后,我尝试使用Python的“scipy.stats”模块以及Excel/LibreOffice的“Poisson”和“CHITEST”函数来重现它的发现 对于本文中显示的预期值,我只使用了: import scipy.stats for i in range(8): print(scipy.stats.poisson.pmf(i, 2)*31) 这复制了博客文章中显示的表格——我也在LibreOffice中重新创建了它,使用了单元格A1、A2、

在阅读了一篇关于泊松分布的应用程序的文章后,我尝试使用Python的“scipy.stats”模块以及Excel/LibreOffice的“Poisson”和“CHITEST”函数来重现它的发现

对于本文中显示的预期值,我只使用了:

import scipy.stats
for i in range(8):
    print(scipy.stats.poisson.pmf(i, 2)*31)
这复制了博客文章中显示的表格——我也在LibreOffice中重新创建了它,使用了单元格A1、A2、…、A8中值为0到7的第一列a,并在列B的前8行中重复了简单的公式“=POISSON(A1、2、0)*31”

到目前为止还不错-现在对于卡方检验值:

在LibreOffice下,我只是在单元格C1-C8中记下观察到的值,并使用“=CHITEST(C1:C8,B1:B8)”复制文章报告的p值0.18。但是,在scipy.stats下,我似乎无法重现此值:

import numpy as np
import scipy.stats

obs = [4, 10, 7, 5, 4, 0, 0, 1]
exp = [scipy.stats.poisson.pmf(i, 2)*31 for i in range(8)]

# we only estimated one variable (the rate of 2 killings per year via 62/31) 
# so dof will be N-1-estimates
estimates = 1
print(scipy.stats.chisquare(np.array(obs), np.array(exp), ddof=len(obs)-1-estimates))
# (10.112318133864241, 0.0014728159441179519)
# the p-test value reported is 0.00147, not 0.18...
#
# Maybe I need to aggregate categories with observations less than 5 
# (as suggested in many textbooks of statistics for chi-squared tests)?
observedAggregateLessThan5 = [14, 7, 5, 5]
expectedAggregateLessThan5 = [exp[0]+exp[1], exp[2], exp[3], sum(exp[4:])]
print(scipy.stats.chisquare(np.array(observedAggregateLessThan5), np.array(expectedAggregateLessThan5), ddof=len(observedAggregateLessThan5)-1-estimates))
# (0.53561749342466913, 0.46425467595930309)
# Again the p-test value computed is not 0.18, it is 0.46...

我做错了什么?

您没有正确使用
ddof
参数
ddof
是对默认自由度所做的更改。默认值比长度小一个。因此,您根本不必指定
ddof

In [21]: obs
Out[21]: [4, 10, 7, 5, 4, 0, 0, 1]

In [22]: exp
Out[22]: 
[4.1953937803349941,
 8.3907875606699882,
 8.3907875606699882,
 5.5938583737799901,
 2.796929186889995,
 1.1187716747559984,
 0.37292389158533251,
 0.10654968331009501]

In [23]: chisquare(obs, f_exp=array(exp))
Out[23]: (10.112318133864241, 0.1822973566091409)