Python-Mann-Whitney置信区间
我有两个数据集(熊猫系列)——ds1和ds2——我想计算平均值(如果正常)或中位数(非正常)差异的95%置信区间 对于均值差异,我计算t检验统计量和CI,如下所示:Python-Mann-Whitney置信区间,python,statistics,Python,Statistics,我有两个数据集(熊猫系列)——ds1和ds2——我想计算平均值(如果正常)或中位数(非正常)差异的95%置信区间 对于均值差异,我计算t检验统计量和CI,如下所示: import statsmodels.api as sm tstat, p_value, dof = sm.stats.ttest_ind(ds1, ds2) CI = sm.stats.CompareMeans.from_data(ds1, ds2).tconfint_diff() 至于中位数,我有: from scipy.st
import statsmodels.api as sm
tstat, p_value, dof = sm.stats.ttest_ind(ds1, ds2)
CI = sm.stats.CompareMeans.from_data(ds1, ds2).tconfint_diff()
至于中位数,我有:
from scipy.stats import mannwhitneyu
U_stat, p_value = mannwhitneyu(ds1, ds2, True, "two-sided")
如何计算中位数差异的置信区间?我看到一篇论文(计算一些非参数的置信区间)
MICHAEL J CAMPBELL、MARTIN J GARDNER的分析给出了CI公式
基于此:
from scipy.stats import norm
ct1 = ds1.count() #items in dataset 1
ct2 = ds2.count() #items in dataset 2
alpha = 0.05 #95% confidence interval
N = norm.ppf(1 - alpha/2) # percent point function - inverse of cdf
# The confidence interval for the difference between the two population
# medians is derived through these nxm differences.
diffs = sorted([i-j for i in ds1 for j in ds2])
# For an approximate 100(1-a)% confidence interval first calculate K:
k = int(round(ct1*ct2/2 - (N * (ct1*ct2*(ct1+ct2+1)/12)**0.5)))
# The Kth smallest to the Kth largest of the n x m differences
# ct1 and ct2 should be > ~20
CI = (diffs[k], diffs[len(diffs)-k])