检查频繁者方法是否正确?使用MCMC进行AB测试的贝叶斯方法。如何在Python中计算Bayes因子?

检查频繁者方法是否正确?使用MCMC进行AB测试的贝叶斯方法。如何在Python中计算Bayes因子?,python,statistics,bayesian,ab-testing,hypothesis-test,Python,Statistics,Bayesian,Ab Testing,Hypothesis Test,我一直在试着了解一个玩具数据AB测试问题的频率和贝叶斯方法 结果对我来说没什么意义。我正在努力理解结果,或者我是否正确计算了结果(这很可能)。此外,经过大量的研究,我对如何计算贝叶斯因子仍然有些迷茫。我见过R中的包使这看起来有点容易。唉,我对R不熟悉,希望能够用Python解决这个问题 我将非常感谢任何有关这方面的帮助和指导 以下是数据: # imports import pingouin as pg import pymc3 as pm import pandas as pd import n

我一直在试着了解一个玩具数据AB测试问题的频率和贝叶斯方法

结果对我来说没什么意义。我正在努力理解结果,或者我是否正确计算了结果(这很可能)。此外,经过大量的研究,我对如何计算贝叶斯因子仍然有些迷茫。我见过R中的包使这看起来有点容易。唉,我对R不熟悉,希望能够用Python解决这个问题

我将非常感谢任何有关这方面的帮助和指导

以下是数据:

# imports
import pingouin as pg
import pymc3 as pm
import pandas as pd
import numpy as np
import scipy.stats as scs
import statsmodels.stats.api as sms
import math
import matplotlib.pyplot as plt

# A = control -- B = treatment
a_success = 10730
a_failure = 61988
a_total = a_success + a_failure
a_cr = a_success / a_total

b_success = 10966
b_failure = 60738
b_total = b_success + b_failure
b_cr = b_success / b_total
我首先做了一些功率分析,以确定所需样本的数量,功率为0.8,α为0.05,实际意义为2%。我不确定是应该提供预期的转换率,还是基线+某个比例。根据效应大小,所需的样本数量显著增加

# determine required sample size 
baseline_rate = a_cr
practical_significance = 0.02
alpha = 0.05
power = 0.8 
nobs1 = None

# is this how to calculate effect size?
effect_size = sms.proportion_effectsize(baseline_rate, baseline_rate + practical_significance) # 5204

# # or this?
# effect_size = sms.proportion_effectsize(baseline_rate, baseline_rate + baseline_rate * practical_significance) # 228583

sample_size = sms.NormalIndPower().solve_power(effect_size = effect_size, 
                                               power = power, 
                                               alpha = alpha,
                                               nobs1 = nobs1,
                                               ratio = 1)
我继续试图确定是否可以拒绝无效假设:

# calculate pooled probability
pooled_probability = (a_success + b_success) / (a_total + b_total)

# calculate pooled standard error and margin of error
se_pooled = math.sqrt(pooled_probability * (1 - pooled_probability) * (1 / b_total + 1 / a_total))
z_score = scs.norm.ppf(1 - alpha / 2)
margin_of_error = se_pooled * z_score

# the estimated difference between probability of conversions of both groups
d_hat = (test_b_success / test_b_total) - (test_a_success / test_a_total)

# test if null hypothesis can be rejected
lower_bound = d_hat - margin_of_error
upper_bound = d_hat + margin_of_error

if practical_significance < lower_bound:
    print("reject null hypothesis -- groups do not have the same conversion rates")
else: 
    print("do not reject the null hypothesis -- groups have the same conversion rates")
首先,我知道你应该烧掉一部分跟踪——你如何确定一个合适的烧掉的索引数

在尝试评估后验概率时,以下代码是正确的方法吗

b_lift = (trace['p_B'].mean() - trace['p_A'].mean()) / trace['p_A'].mean() * 100
b_prob = np.mean(trace["delta"] > 0)

a_lift = (trace['p_A'].mean() - trace['p_B'].mean()) / trace['p_B'].mean() * 100
a_prob = np.mean(trace["delta"] < 0)

# is the Bayes Factor just the ratio of the posterior probabilities for these two models?
BF = (trace['p_B'] / trace['p_A']).mean()

print(f'There is {b_prob} probability B outperforms A by a magnitude of {round(b_lift, 2)}%') 
print(f'There is {a_prob} probability A outperforms B by a magnitude of {round(a_lift, 2)}%') 
print('BF:', BF)
-- output:
There is 0.666 probability B outperforms A by a magnitude of 1.29%
There is 0.334 probability A outperforms B by a magnitude of -1.28%
BF: 1.013357654428127
b_lift=(trace['p_b'].mean()-trace['p_A'].mean())/trace['p_A'].mean()*100
b_prob=np.平均值(轨迹[“增量”]>0)
a_lift=(trace['p_a'].mean()-trace['p_B'].mean())/trace['p_B'].mean()*100
a_prob=np.平均值(道[“δ”]<0)
#Bayes因子只是这两个模型的后验概率之比吗?
BF=(trace['p_B']/trace['p_A'])。平均值()
print(f'There{b_prob}概率b比A高出{round(b_lift,2)}%'))
print(f'There{a_prob}概率a比B高出{round(a_lift,2)}%'))
打印('BF:',BF)
--输出:
有0.666个概率B比A强1.29%
有0.334个概率A优于B,幅度为-1.28%
BF:1.013357654428127
我怀疑这不是计算Bayes因子的正确方法。如何计算贝叶斯系数

我真的希望你能帮助我理解以上所有。。。我意识到这是一篇非常长的文章。但我已经尝试了所有我能找到的资源,但仍然被卡住了

亲切的问候

# generate lists of 1, 0
obs_a = np.repeat([1, 0], [a_success, a_failure]) 
obs_v = np.repeat([1, 0], [b_success, b_failure])

for _ in range(10):
    np.random.shuffle(observations_A)
    np.random.shuffle(observations_B)

with pm.Model() as model:
    p_A = pm.Beta("p_A", 1, 1)
    p_B = pm.Beta("p_B", 1, 1)
    
    delta = pm.Deterministic("delta", p_A - p_B)

    obs_A = pm.Bernoulli("obs_A", p_A, observed = obs_a[:1000])
    obs_B = pm.Bernoulli("obs_B", p_B, observed = obs_b[:1000])
    
    step = pm.NUTS()
    trace = pm.sample(1000, step = step, chains = 2)
b_lift = (trace['p_B'].mean() - trace['p_A'].mean()) / trace['p_A'].mean() * 100
b_prob = np.mean(trace["delta"] > 0)

a_lift = (trace['p_A'].mean() - trace['p_B'].mean()) / trace['p_B'].mean() * 100
a_prob = np.mean(trace["delta"] < 0)

# is the Bayes Factor just the ratio of the posterior probabilities for these two models?
BF = (trace['p_B'] / trace['p_A']).mean()

print(f'There is {b_prob} probability B outperforms A by a magnitude of {round(b_lift, 2)}%') 
print(f'There is {a_prob} probability A outperforms B by a magnitude of {round(a_lift, 2)}%') 
print('BF:', BF)
-- output:
There is 0.666 probability B outperforms A by a magnitude of 1.29%
There is 0.334 probability A outperforms B by a magnitude of -1.28%
BF: 1.013357654428127