A/B Testing with Practice in Python (Part One)
I learned A/B testing from a Youtube vedio. The link is https://www.youtube.com/watch?v=Bu7OqjYk0jM.
I will divide the note into two parts. The first part is generally an overview of hypothesis testing. Most concepts can be found in the article "Statistics Basics: Main Concepts in Hypothesis Testing" and I will focus on pratical applications here.
Actual Predicted |
T (H1) | F (H0) |
T (H1) | TP | FP (α) |
F (H0) | FN (β) | TN |
P = TP/(TP+FN)
R = 1-β =TP/(TP+FN)
Python Example
Case #1: Alternate hypothesis is true
n = 100
p1 = 0.4
p2 = 0.6 # Compute distributions
x = np.arange(0,n+1)
pmf1 = stats.binom.pmf(x,n,p1)
pmf2 = stats.binom.pmf(x,n,p2)
plot(x,pmf1,pmf2)
We can find that the distributions between Coin 1 and Coin 2 are different. We check different values of m1 and m2.
# Example outcomes
m1, m2 = 40, 60
table = [[m1, n-m1], [m2, n-m2]]
chi2, pval, dof, expected = stats.chi2_contingency(table)
decision = 'reject H0' if pval<0.05 else 'accept H0'
print('{} ({})'.format(pval,decision))
0.007209570764742524 (reject H0)
# Example outcomes
m1, m2 = 43, 57
table = [[m1, n-m1], [m2, n-m2]]
chi2, pval, dof, expected = stats.chi2_contingency(table)
decision = 'reject H0' if pval<0.05 else 'accept H0'
print('{} ({})'.format(pval,decision))
0.06599205505934735 (accept H0)
In the secod example, m1 and m2 are not different significantly to reject H0.
Case #2: Null hypothesis is true
n = 100
p1 = 0.5
p2 = 0.5 # Compute distributions
x = np.arange(0,n+1)
pmf1 = stats.binom.pmf(x,n,p1)
pmf2 = stats.binom.pmf(x,n,p2)
plot(x,pmf1,pmf2)
In this case, two distributions overlap because we define the same value of p1 and p2.
# Example outcomes
m1, m2 = 49, 51
table = [[m1, n-m1], [m2, n-m2]]
chi2, pval, dof, expected = stats.chi2_contingency(table)
decision = 'reject H0' if pval<0.05 else 'accept H0'
print('{} ({})'.format(pval,decision))
0.887537083981715 (accept H0)
Actuall, we can only say that m1 and m2 are not different significantly to reject H0. It doesn't mean we should accept H0. The explanation is given in the previous article.
# Example outcomes
m1, m2 = 42, 58
table = [[m1, n-m1], [m2, n-m2]]
chi2, pval, dof, expected = stats.chi2_contingency(table)
decision = 'reject H0' if pval<0.05 else 'accept H0'
print('{} ({})'.format(pval,decision))
0.033894853524689295 (reject H0)

Firstly, calculate the sample size:
p1, p2 = 0.0500, 0.0515
alpha = 0.05
beta = 0.05 # Evaluate quantile function
p_bar = (p1+p2)/2.0
za = stats.norm.ppf(1-alpha/2) # Two-sided test
zb = stats.norm.ppf(1-beta) # Compute correction factor
A = (za*np.sqrt(2*p_bar*(1-p_bar))+ zb*np.sqrt(p1*(1-p1)+p2*(1-p2)))**2 #Estimate samples required
n = A*(((1+np.sqrt(1+4*(p1-p2)/A)))/(2*(p1-p2)))**2 print (n) # we need 2n users
555118.7638311392
So for test and control combined we'll need at least 2n = 1.1 million users. This is where this stuff gets hard and you're trying to measure something that doesn't happen often which is usally the thing to care because if it's rare, it's usually valuable. When you're trying to change it, you usually can't change it much because if you can change it a lot then your business would be easier usually it's harder to change the thing you care the most about. So in that case, it's like the hardest case where a/b testing the most values are unchange.
Next we can perform a/b testing:
n = 555119
n_trials = 10000 # Simulate experimental results when null is true
control0 = stats.binom.rvs (n,p1,size = n_trials)
test0 = stats.binom.rvs(n, p1, size = n_trials) # Test and control are the same
tables0 = [[[a, n-a], [b, n-b]] for a, b in zip(control0, test0)]
results0 = [stats.chi2_contingency(T) for T in tables0]
decisions0 = [x[1] <= alpha for x in results0] # Simulate experimental results when alternate is true
control1 = stats.binom.rvs (n,p1,size = n_trials)
test1 = stats.binom.rvs(n, p2, size = n_trials) # Test and control are the same
tables1 = [[[a, n-a], [b, n-b]] for a, b in zip(control1, test1)]
results1 = [stats.chi2_contingency(T) for T in tables1]
decisions1 = [x[1] <= alpha for x in results1] # Compute false alarm and correct detection rates
alpha_est = sum(decisions0)/float(n_trials)
power_est = sum(decisions1)/float(n_trials) print('Theoretical false alarm rate = {:0.4f}, '.format(alpha)+
'empirical false alarm rate = {:0.4f}'.format(alpha_est))
print('Theoretical power = {:0.4f}, '.format(1-beta)+
'empirical power = {:0.4f}'.format(power_est))
Theoretical false alarm rate = 0.0500, empirical false alarm rate = 0.0509
Theoretical power = 0.9500, empirical power = 0.9536
A/B Testing with Practice in Python (Part One)的更多相关文章
- A/B Testing with Practice in Python (Part Two)
This is the second part of A/B testing notes, which contains the practical issues and alternatives o ...
- [Python + Unit Testing] Write Your First Python Unit Test with pytest
In this lesson you will create a new project with a virtual environment and write your first unit te ...
- Testing shell commands from Python
如何测试shell命令?最近,我遇到了一些情况,我想运行shell命令进行测试,Python称为万能胶水语言,一些自动化测试都可以完成,目前手头的工作都是用python完成的.但是无法从Python中 ...
- [The Basics of Hacking and Penetration Testing] Learn & Practice
Remember to consturct your test environment. Kali Linux & Metasploitable2 & Windows XP
- Automation Testing - Best Practice(书写规范)
Coding Standards Coding Standards are suggestions that will help us to write automation Scripts code ...
- Machine and Deep Learning with Python
Machine and Deep Learning with Python Education Tutorials and courses Supervised learning superstiti ...
- python安装locustio报错error: invalid command 'bdist_wheel'的解决方法
locust--scalable user load testing tool writen in Python(是用python写的.规模化.可扩展的测试性能的工具) 安装locustio需要的环境 ...
- Python框架、库以及软件资源汇总
转自:http://developer.51cto.com/art/201507/483510.htm 很多来自世界各地的程序员不求回报的写代码为别人造轮子.贡献代码.开发框架.开放源代码使得分散在世 ...
- Awesome Python
Awesome Python A curated list of awesome Python frameworks, libraries, software and resources. Insp ...
随机推荐
- windows下Tomcat安装
环境Windows 64位 jdk1.8 1.Tomcat安装 官网地址:http://tomcat.apache.org/download-90.cgi 下载安装包,安装之后进行解压 2.修改htt ...
- maven中scope标签作用
scope 是用来限制 dependency 的作用范围的,影响 maven 项目在各个生命周期时导入的 package 的状态,主要管理依赖的部署. scope 的作用范围: (1)compile: ...
- tomcat运行solr
https://blog.csdn.net/u010346953/article/details/67640036
- TensorFlow dataset API 使用
# TensorFlow dataset API 使用 由于本人感兴趣的是自然语言处理,所以下面有关dataset API 的使用偏向于变长数据的处理. 1. 从迭代器中引入数据 import num ...
- xcrun: error: active developer path
xcrun: error: active developer path ("/Applications/Xcode 2.app/Contents/Developer") does ...
- iPhone新建项目不能全屏
上个周做项目的时候,发现新建了一个项目不能全屏.伤透了我的脑筋,然后又请教了团队里其他两个大牛帮我搞定了这个问题. 虽然是搞定了,但也看的出大牛也是云里雾里.歪打正着解决的. 今天又想新做个项目,这个 ...
- redis的socket event loop
很早之前就因为nosql就听说了redis,直到去年才真正去了解,只能说相见恨晚. 因为数据库相关,我以为这应该是个庞然大物,万万没想到,源码不到2M,所以,我不知道该说啥了... 还是来点靠谱的: ...
- PHP面向对象单例模式(懒汉式)
知识点: 一.三私一公: ①.私有静态属性,又来储存生成的唯一对象 ②.私有构造函数 ③.私有克隆函数,防止克隆——clone ④.公共静态方法,用来访问静态属性储存的对象,如果没有对象,则生成此单例 ...
- MySQL常用客户端 命令
登录MySQL mysql -h localhost -uroot -p 授权指定用户访问指定数据库 GRANT ALL ON cookbook.* TO 'cbuser'@'localhost' I ...
- 基于MPLAB X IDE配置位设置讲解
http://blog.csdn.net/superanters/article/details/8541171 在讲基于MPLAB X IDE 配置位配置前我先讲讲如何配置配置位. 比如PICLF1 ...