I learned A/B testing from a Youtube vedio. The link is https://www.youtube.com/watch?v=Bu7OqjYk0jM.

I will divide the note into two parts. The first part is generally an overview of hypothesis testing. Most concepts can be found in the article  "Statistics Basics: Main Concepts in Hypothesis Testing" and I will focus on pratical applications here.

 

Actual

Predicted

T (H1) F (H0)
T (H1) TP FP (α)
F (H0) FN (β) TN

  P = TP/(TP+FN)

  R = 1-β =TP/(TP+FN)

Python Example

Case #1: Alternate hypothesis is true

n = 100
p1 = 0.4
p2 = 0.6 # Compute distributions
x = np.arange(0,n+1)
pmf1 = stats.binom.pmf(x,n,p1)
pmf2 = stats.binom.pmf(x,n,p2)
plot(x,pmf1,pmf2)

We can find that the distributions between Coin 1 and Coin 2 are different. We check different values of m1 and m2.

# Example outcomes
m1, m2 = 40, 60
table = [[m1, n-m1], [m2, n-m2]]
chi2, pval, dof, expected = stats.chi2_contingency(table)
decision = 'reject H0' if pval<0.05 else 'accept H0'
print('{} ({})'.format(pval,decision))
0.007209570764742524 (reject H0)
# Example outcomes
m1, m2 = 43, 57
table = [[m1, n-m1], [m2, n-m2]]
chi2, pval, dof, expected = stats.chi2_contingency(table)
decision = 'reject H0' if pval<0.05 else 'accept H0'
print('{} ({})'.format(pval,decision))
0.06599205505934735 (accept H0)

In the secod example, m1 and m2 are not different significantly to reject H0.

Case #2: Null hypothesis is true

n = 100
p1 = 0.5
p2 = 0.5 # Compute distributions
x = np.arange(0,n+1)
pmf1 = stats.binom.pmf(x,n,p1)
pmf2 = stats.binom.pmf(x,n,p2)
plot(x,pmf1,pmf2)

In this case, two distributions overlap because we define the same value of p1 and p2.

# Example outcomes
m1, m2 = 49, 51
table = [[m1, n-m1], [m2, n-m2]]
chi2, pval, dof, expected = stats.chi2_contingency(table)
decision = 'reject H0' if pval<0.05 else 'accept H0'
print('{} ({})'.format(pval,decision))
0.887537083981715 (accept H0)

Actuall, we can only say that m1 and m2 are not different significantly to reject H0. It doesn't mean we should accept H0. The explanation is given in the previous article.

# Example outcomes
m1, m2 = 42, 58
table = [[m1, n-m1], [m2, n-m2]]
chi2, pval, dof, expected = stats.chi2_contingency(table)
decision = 'reject H0' if pval<0.05 else 'accept H0'
print('{} ({})'.format(pval,decision))
0.033894853524689295 (reject H0)

Firstly, calculate the sample size:

p1, p2 = 0.0500, 0.0515
alpha = 0.05
beta = 0.05 # Evaluate quantile function
p_bar = (p1+p2)/2.0
za = stats.norm.ppf(1-alpha/2) # Two-sided test
zb = stats.norm.ppf(1-beta) # Compute correction factor
A = (za*np.sqrt(2*p_bar*(1-p_bar))+ zb*np.sqrt(p1*(1-p1)+p2*(1-p2)))**2 #Estimate samples required
n = A*(((1+np.sqrt(1+4*(p1-p2)/A)))/(2*(p1-p2)))**2 print (n) # we need 2n users
555118.7638311392

So for test and control combined we'll need at least 2n = 1.1 million users. This is where this stuff gets hard and you're trying to measure something that doesn't happen often which is usally the thing to care because if it's rare, it's usually valuable. When you're trying to change it, you usually can't change it much because if you can change it a lot then your business would be easier usually it's harder to change the thing you care the most about. So in that case, it's like the hardest case where a/b testing the most values are unchange.

Next we can perform a/b testing:

n = 555119
n_trials = 10000 # Simulate experimental results when null is true
control0 = stats.binom.rvs (n,p1,size = n_trials)
test0 = stats.binom.rvs(n, p1, size = n_trials) # Test and control are the same
tables0 = [[[a, n-a], [b, n-b]] for a, b in zip(control0, test0)]
results0 = [stats.chi2_contingency(T) for T in tables0]
decisions0 = [x[1] <= alpha for x in results0] # Simulate experimental results when alternate is true
control1 = stats.binom.rvs (n,p1,size = n_trials)
test1 = stats.binom.rvs(n, p2, size = n_trials) # Test and control are the same
tables1 = [[[a, n-a], [b, n-b]] for a, b in zip(control1, test1)]
results1 = [stats.chi2_contingency(T) for T in tables1]
decisions1 = [x[1] <= alpha for x in results1] # Compute false alarm and correct detection rates
alpha_est = sum(decisions0)/float(n_trials)
power_est = sum(decisions1)/float(n_trials) print('Theoretical false alarm rate = {:0.4f}, '.format(alpha)+
'empirical false alarm rate = {:0.4f}'.format(alpha_est))
print('Theoretical power = {:0.4f}, '.format(1-beta)+
'empirical power = {:0.4f}'.format(power_est))
Theoretical false alarm rate = 0.0500, empirical false alarm rate = 0.0509
Theoretical power = 0.9500, empirical power = 0.9536

A/B Testing with Practice in Python (Part One)的更多相关文章

  1. A/B Testing with Practice in Python (Part Two)

    This is the second part of A/B testing notes, which contains the practical issues and alternatives o ...

  2. [Python + Unit Testing] Write Your First Python Unit Test with pytest

    In this lesson you will create a new project with a virtual environment and write your first unit te ...

  3. Testing shell commands from Python

    如何测试shell命令?最近,我遇到了一些情况,我想运行shell命令进行测试,Python称为万能胶水语言,一些自动化测试都可以完成,目前手头的工作都是用python完成的.但是无法从Python中 ...

  4. [The Basics of Hacking and Penetration Testing] Learn & Practice

    Remember to consturct your test environment. Kali Linux & Metasploitable2 & Windows XP

  5. Automation Testing - Best Practice(书写规范)

    Coding Standards Coding Standards are suggestions that will help us to write automation Scripts code ...

  6. Machine and Deep Learning with Python

    Machine and Deep Learning with Python Education Tutorials and courses Supervised learning superstiti ...

  7. python安装locustio报错error: invalid command 'bdist_wheel'的解决方法

    locust--scalable user load testing tool writen in Python(是用python写的.规模化.可扩展的测试性能的工具) 安装locustio需要的环境 ...

  8. Python框架、库以及软件资源汇总

    转自:http://developer.51cto.com/art/201507/483510.htm 很多来自世界各地的程序员不求回报的写代码为别人造轮子.贡献代码.开发框架.开放源代码使得分散在世 ...

  9. Awesome Python

    Awesome Python  A curated list of awesome Python frameworks, libraries, software and resources. Insp ...

随机推荐

  1. 《Cracking the Coding Interview》——第3章:栈和队列——题目6

    2014-03-19 03:01 题目:给定一个栈,设计一个算法,在只使用栈操作的情况下将其排序.你可以额外用一个栈.排序完成后,最大元素在栈顶. 解法:我在草稿纸上试了试{1,4,2,3}之类的小例 ...

  2. 【Dual Support Vector Machine】林轩田机器学习技法

    这节课内容介绍了SVM的核心. 首先,既然SVM都可以转化为二次规划问题了,为啥还有有Dual啥的呢?原因如下: 如果x进行non-linear transform后,二次规划算法需要面对的是d`+1 ...

  3. 架构师速成7.3-devops为什么很重要 分类: 架构师速成 2015-07-07 17:22 410人阅读 评论(0) 收藏

    evops是一个很高大上的名字,其实说的简单点就是开发和运维本身就是一个团队的,要干就一起把事情干好.谁出了问题,网站都不行.作为一个架构师,必须要devops,而且要知道如何推行devops. 首先 ...

  4. ImportError: dynamic module does not define module export function (PyInit__caffe)

    使用python3运行caffe,报了该错误. 参考网址:https://stackoverflow.com/questions/34295136/importerror-dynamic-module ...

  5. jQuery制作table表格布局插件带有列左右拖动效果

    压缩包:http://www.xwcms.net/js/bddm/99004.html

  6. android桌面悬浮窗实现

                            首先是一个小的悬浮窗显示的是当前使用了百分之多少的内存,点击一下小悬浮窗,就会弹出一个大的悬浮窗,可以一键加速.好,我们现在就来模拟实现一下类似的效果. ...

  7. UVALive 5029 字典树

    E - Encoded Barcodes Crawling in process...Crawling failedTime Limit:3000MS    Memory Limit:0KB    6 ...

  8. 【DNS】DNS的几个基本概念

    一. 根域 就是所谓的“.”,其实我们的网址www.baidu.com在配置当中应该是www.baidu.com.(最后有一点),一般我们在浏览器里输入时会省略后面的点,而这也已经成为了习惯. 根域服 ...

  9. sqlachemy 原生sql输出

    在创建引擎时,将echo参数配置成True,会输出sql执行语句记录.默认False create_engine(statsticConf.sqlalchemy_mysql,connect_args= ...

  10. npm & npm config

    npm command show npm config https://docs.npmjs.com/cli/config https://docs.npmjs.com/cli/ls https:// ...