python中R data.chisq$residuals的等价物是什么?
我有以下数据:python中R data.chisq$residuals的等价物是什么?,python,r,scipy,Python,R,Scipy,我有以下数据: array([[33, 250, 196, 136, 32], [55, 293, 190, 71, 13]]) 我可以从stats.chi2_或有数据中获得p值 有没有类似于R object-data.chisq$残差的东西来获得皮尔逊残差和标准化残差 如果您不介意依赖关系,那么它有一个用于的模块。比如说, In [2]: import numpy as np
array([[33, 250, 196, 136, 32],
[55, 293, 190, 71, 13]])
我可以从stats.chi2_或有数据中获得p值
有没有类似于R object-data.chisq$残差的东西来获得皮尔逊残差和标准化残差 如果您不介意依赖关系,那么它有一个用于的模块。比如说,
In [2]: import numpy as np
In [3]: import statsmodels.api as sm
In [4]: F = np.array([[33, 250, 196, 136, 32], [55, 293, 190, 71, 13]])
In [5]: table = sm.stats.Table(F)
In [6]: table.resid_pearson # Pearson's residuals
Out[6]:
array([[-1.77162519, -1.61362277, -0.05718356, 2.96508777, 1.89079393],
[ 1.80687785, 1.64573143, 0.05832142, -3.02408853, -1.92841787]])
In [7]: table.standardized_resids # Standardized residuals
Out[7]:
array([[-2.62309082, -3.0471942 , -0.09791681, 4.6295814 , 2.74991911],
[ 2.62309082, 3.0471942 , 0.09791681, -4.6295814 , -2.74991911]])
如果您不想依赖statsmodels,可以使用scipy.stats.chi2_的结果在几行中实现这些计算。下面是一个简短的模块,它定义了这些残差的函数。他们采用chi2_偶然事件返回的观测频率和预期频率。请注意,尽管chi2_列联和以下残差函数适用于n维阵列,但此处实现的STDRE仅适用于2D阵列
from __future__ import division
import numpy as np
from scipy.stats.contingency import margins
def residuals(observed, expected):
return (observed - expected) / np.sqrt(expected)
def stdres(observed, expected):
n = observed.sum()
rsum, csum = margins(observed)
# With integers, the calculation
# csum * rsum * (n - rsum) * (n - csum)
# might overflow, so convert rsum and csum to floating point.
rsum = rsum.astype(np.float64)
csum = csum.astype(np.float64)
v = csum * rsum * (n - rsum) * (n - csum) / n**3
return (observed - expected) / np.sqrt(v)
根据您的数据,我们可以:
>>> F = np.array([[33, 250, 196, 136, 32], [55, 293, 190, 71, 13]])
>>> chi2, p, dof, expected = chi2_contingency(F)
>>> residuals(F, expected)
array([[-1.77162519, -1.61362277, -0.05718356, 2.96508777, 1.89079393],
[ 1.80687785, 1.64573143, 0.05832142, -3.02408853, -1.92841787]])
>>> stdres(F, expected)
array([[-2.62309082, -3.0471942 , -0.09791681, 4.6295814 , 2.74991911],
[ 2.62309082, 3.0471942 , 0.09791681, -4.6295814 , -2.74991911]])
以下是R中的计算值,以供比较:
> F <- as.table(rbind(c(33, 250, 196, 136, 32), c(55, 293, 190, 71, 13)))
> result <- chisq.test(F)
> result$residuals
A B C D E
A -1.77162519 -1.61362277 -0.05718356 2.96508777 1.89079393
B 1.80687785 1.64573143 0.05832142 -3.02408853 -1.92841787
> result$stdres
A B C D E
A -2.62309082 -3.04719420 -0.09791681 4.62958140 2.74991911
B 2.62309082 3.04719420 0.09791681 -4.62958140 -2.74991911
如果您不介意依赖关系,那么它有一个用于的模块。比如说,
In [2]: import numpy as np
In [3]: import statsmodels.api as sm
In [4]: F = np.array([[33, 250, 196, 136, 32], [55, 293, 190, 71, 13]])
In [5]: table = sm.stats.Table(F)
In [6]: table.resid_pearson # Pearson's residuals
Out[6]:
array([[-1.77162519, -1.61362277, -0.05718356, 2.96508777, 1.89079393],
[ 1.80687785, 1.64573143, 0.05832142, -3.02408853, -1.92841787]])
In [7]: table.standardized_resids # Standardized residuals
Out[7]:
array([[-2.62309082, -3.0471942 , -0.09791681, 4.6295814 , 2.74991911],
[ 2.62309082, 3.0471942 , 0.09791681, -4.6295814 , -2.74991911]])
如果您不想依赖statsmodels,可以使用scipy.stats.chi2_的结果在几行中实现这些计算。下面是一个简短的模块,它定义了这些残差的函数。他们采用chi2_偶然事件返回的观测频率和预期频率。请注意,尽管chi2_列联和以下残差函数适用于n维阵列,但此处实现的STDRE仅适用于2D阵列
from __future__ import division
import numpy as np
from scipy.stats.contingency import margins
def residuals(observed, expected):
return (observed - expected) / np.sqrt(expected)
def stdres(observed, expected):
n = observed.sum()
rsum, csum = margins(observed)
# With integers, the calculation
# csum * rsum * (n - rsum) * (n - csum)
# might overflow, so convert rsum and csum to floating point.
rsum = rsum.astype(np.float64)
csum = csum.astype(np.float64)
v = csum * rsum * (n - rsum) * (n - csum) / n**3
return (observed - expected) / np.sqrt(v)
根据您的数据,我们可以:
>>> F = np.array([[33, 250, 196, 136, 32], [55, 293, 190, 71, 13]])
>>> chi2, p, dof, expected = chi2_contingency(F)
>>> residuals(F, expected)
array([[-1.77162519, -1.61362277, -0.05718356, 2.96508777, 1.89079393],
[ 1.80687785, 1.64573143, 0.05832142, -3.02408853, -1.92841787]])
>>> stdres(F, expected)
array([[-2.62309082, -3.0471942 , -0.09791681, 4.6295814 , 2.74991911],
[ 2.62309082, 3.0471942 , 0.09791681, -4.6295814 , -2.74991911]])
以下是R中的计算值,以供比较:
> F <- as.table(rbind(c(33, 250, 196, 136, 32), c(55, 293, 190, 71, 13)))
> result <- chisq.test(F)
> result$residuals
A B C D E
A -1.77162519 -1.61362277 -0.05718356 2.96508777 1.89079393
B 1.80687785 1.64573143 0.05832142 -3.02408853 -1.92841787
> result$stdres
A B C D E
A -2.62309082 -3.04719420 -0.09791681 4.62958140 2.74991911
B 2.62309082 3.04719420 0.09791681 -4.62958140 -2.74991911
谢谢@Warren!我以为有一些我不知道的内置功能。谢谢@Warren!我以为有一些我不知道的内置功能。