Statistics 比例测试：Z-测试与引导/置换-不同的结果_Statistics_Probability_Hypothesis Test

Statistics 比例测试：Z-测试与引导/置换-不同的结果

statistics

Statistics 比例测试：Z-测试与引导/置换-不同的结果,statistics,probability,hypothesis-test,Statistics,Probability,Hypothesis Test,我正在学习假设检验，并学习以下示例：一家大型电力公司的首席执行官声称，他的100万客户中有80%对他们所获得的服务非常满意。为了验证这一说法，当地报纸通过简单的随机抽样调查了100名顾客。在抽样调查的客户中，73%的人表示他们非常满意。基于这些发现，我们能否拒绝首席执行官的假设，即80%的客户非常满意？使用0.05的显著性水平与python中的bootstrapping方法相比，使用一个样本z-test计算p值时，我得到了不同的结果。 Z-试验方法： σ=sqrt[（0.8*0.2）/100

我正在学习假设检验，并学习以下示例：

一家大型电力公司的首席执行官声称，他的100万客户中有80%对他们所获得的服务非常满意。为了验证这一说法，当地报纸通过简单的随机抽样调查了100名顾客。在抽样调查的客户中，73%的人表示他们非常满意。基于这些发现，我们能否拒绝首席执行官的假设，即80%的客户非常满意？使用0.05的显著性水平

与python中的bootstrapping方法相比，使用一个样本z-test计算p值时，我得到了不同的结果。

Z-试验方法：

σ=sqrt[（0.8*0.2）/100]=sqrt（0.0016）=0.04 z=（p-p）/σ=（0.73-.80）/0.04=-1.75

双尾检验p（z<-1.75）=0.04，p（z>1.75）=0.04

因此，p值=0.04+0.04=0.08。

引导方法（Python）：

一般方法是从80%满意的人群（1000000）中随机抽取100个样本

repeat 5000 times:
    take random sample of size 100 from population (1,000,000, 80% of which are satisfied)
    count the number of satisfied customers in sample, and append count to list satisfied_counts
calculate number of times that a value of 73 or more extreme (<73) occurs. Divide this by the number of items in satisfied_counts

Since it's a two-tailed test, double the result to get the p-value.

重复5000次：
从人群中随机抽取100个样本（1000000，其中80%满意）
统计样本中满意客户的数量，并将计数附加到满意客户的列表中
计算73或更极端值（该差值是围栏柱/舍入误差的一种形式）的次数
正态近似表示得到0.73的几率大约是相应正态分布在0.725和0.735之间的几率。因此，您应该使用0.735作为截止值。这将使两个数字更接近。谢谢，就是这样！
population = np.array(['satisfied']*800000+['not satisfied']*200000)     # 80% satisfied (1M population)
num_runs = 5000
sample_size = 100
satisfied_counts = []

for i in range(num_runs):
    sample = np.random.choice(population, size=sample_size, replace = False)
    hist = pd.Series(sample).value_counts()
    satisfied_counts.append(hist['satisfied'])

p_val = sum(i <= 73 for i in satisfied_counts) / len(satisfied_counts) * 2