A/B Testing with Practice in Python (Part One)
I learned A/B testing from a Youtube vedio. The link is https://www.youtube.com/watch?v=Bu7OqjYk0jM.
I will divide the note into two parts. The first part is generally an overview of hypothesis testing. Most concepts can be found in the article "Statistics Basics: Main Concepts in Hypothesis Testing" and I will focus on pratical applications here.








|
Actual Predicted |
T (H1) | F (H0) |
| T (H1) | TP | FP (α) |
| F (H0) | FN (β) | TN |
P = TP/(TP+FN)
R = 1-β =TP/(TP+FN)




Python Example
Case #1: Alternate hypothesis is true
n = 100
p1 = 0.4
p2 = 0.6 # Compute distributions
x = np.arange(0,n+1)
pmf1 = stats.binom.pmf(x,n,p1)
pmf2 = stats.binom.pmf(x,n,p2)
plot(x,pmf1,pmf2)

We can find that the distributions between Coin 1 and Coin 2 are different. We check different values of m1 and m2.
# Example outcomes
m1, m2 = 40, 60
table = [[m1, n-m1], [m2, n-m2]]
chi2, pval, dof, expected = stats.chi2_contingency(table)
decision = 'reject H0' if pval<0.05 else 'accept H0'
print('{} ({})'.format(pval,decision))
0.007209570764742524 (reject H0)
# Example outcomes
m1, m2 = 43, 57
table = [[m1, n-m1], [m2, n-m2]]
chi2, pval, dof, expected = stats.chi2_contingency(table)
decision = 'reject H0' if pval<0.05 else 'accept H0'
print('{} ({})'.format(pval,decision))
0.06599205505934735 (accept H0)
In the secod example, m1 and m2 are not different significantly to reject H0.
Case #2: Null hypothesis is true
n = 100
p1 = 0.5
p2 = 0.5 # Compute distributions
x = np.arange(0,n+1)
pmf1 = stats.binom.pmf(x,n,p1)
pmf2 = stats.binom.pmf(x,n,p2)
plot(x,pmf1,pmf2)

In this case, two distributions overlap because we define the same value of p1 and p2.
# Example outcomes
m1, m2 = 49, 51
table = [[m1, n-m1], [m2, n-m2]]
chi2, pval, dof, expected = stats.chi2_contingency(table)
decision = 'reject H0' if pval<0.05 else 'accept H0'
print('{} ({})'.format(pval,decision))
0.887537083981715 (accept H0)
Actuall, we can only say that m1 and m2 are not different significantly to reject H0. It doesn't mean we should accept H0. The explanation is given in the previous article.
# Example outcomes
m1, m2 = 42, 58
table = [[m1, n-m1], [m2, n-m2]]
chi2, pval, dof, expected = stats.chi2_contingency(table)
decision = 'reject H0' if pval<0.05 else 'accept H0'
print('{} ({})'.format(pval,decision))
0.033894853524689295 (reject H0)



Firstly, calculate the sample size:
p1, p2 = 0.0500, 0.0515
alpha = 0.05
beta = 0.05 # Evaluate quantile function
p_bar = (p1+p2)/2.0
za = stats.norm.ppf(1-alpha/2) # Two-sided test
zb = stats.norm.ppf(1-beta) # Compute correction factor
A = (za*np.sqrt(2*p_bar*(1-p_bar))+ zb*np.sqrt(p1*(1-p1)+p2*(1-p2)))**2 #Estimate samples required
n = A*(((1+np.sqrt(1+4*(p1-p2)/A)))/(2*(p1-p2)))**2 print (n) # we need 2n users
555118.7638311392
So for test and control combined we'll need at least 2n = 1.1 million users. This is where this stuff gets hard and you're trying to measure something that doesn't happen often which is usally the thing to care because if it's rare, it's usually valuable. When you're trying to change it, you usually can't change it much because if you can change it a lot then your business would be easier usually it's harder to change the thing you care the most about. So in that case, it's like the hardest case where a/b testing the most values are unchange.
Next we can perform a/b testing:
n = 555119
n_trials = 10000 # Simulate experimental results when null is true
control0 = stats.binom.rvs (n,p1,size = n_trials)
test0 = stats.binom.rvs(n, p1, size = n_trials) # Test and control are the same
tables0 = [[[a, n-a], [b, n-b]] for a, b in zip(control0, test0)]
results0 = [stats.chi2_contingency(T) for T in tables0]
decisions0 = [x[1] <= alpha for x in results0] # Simulate experimental results when alternate is true
control1 = stats.binom.rvs (n,p1,size = n_trials)
test1 = stats.binom.rvs(n, p2, size = n_trials) # Test and control are the same
tables1 = [[[a, n-a], [b, n-b]] for a, b in zip(control1, test1)]
results1 = [stats.chi2_contingency(T) for T in tables1]
decisions1 = [x[1] <= alpha for x in results1] # Compute false alarm and correct detection rates
alpha_est = sum(decisions0)/float(n_trials)
power_est = sum(decisions1)/float(n_trials) print('Theoretical false alarm rate = {:0.4f}, '.format(alpha)+
'empirical false alarm rate = {:0.4f}'.format(alpha_est))
print('Theoretical power = {:0.4f}, '.format(1-beta)+
'empirical power = {:0.4f}'.format(power_est))
Theoretical false alarm rate = 0.0500, empirical false alarm rate = 0.0509
Theoretical power = 0.9500, empirical power = 0.9536

A/B Testing with Practice in Python (Part One)的更多相关文章
- A/B Testing with Practice in Python (Part Two)
This is the second part of A/B testing notes, which contains the practical issues and alternatives o ...
- [Python + Unit Testing] Write Your First Python Unit Test with pytest
In this lesson you will create a new project with a virtual environment and write your first unit te ...
- Testing shell commands from Python
如何测试shell命令?最近,我遇到了一些情况,我想运行shell命令进行测试,Python称为万能胶水语言,一些自动化测试都可以完成,目前手头的工作都是用python完成的.但是无法从Python中 ...
- [The Basics of Hacking and Penetration Testing] Learn & Practice
Remember to consturct your test environment. Kali Linux & Metasploitable2 & Windows XP
- Automation Testing - Best Practice(书写规范)
Coding Standards Coding Standards are suggestions that will help us to write automation Scripts code ...
- Machine and Deep Learning with Python
Machine and Deep Learning with Python Education Tutorials and courses Supervised learning superstiti ...
- python安装locustio报错error: invalid command 'bdist_wheel'的解决方法
locust--scalable user load testing tool writen in Python(是用python写的.规模化.可扩展的测试性能的工具) 安装locustio需要的环境 ...
- Python框架、库以及软件资源汇总
转自:http://developer.51cto.com/art/201507/483510.htm 很多来自世界各地的程序员不求回报的写代码为别人造轮子.贡献代码.开发框架.开放源代码使得分散在世 ...
- Awesome Python
Awesome Python A curated list of awesome Python frameworks, libraries, software and resources. Insp ...
随机推荐
- 《数据结构》C++代码 邻接表与邻接矩阵
上一篇“BFS与DFS”写完,突然意识到这个可能偏离了“数据结构”的主题,所以回来介绍一下图的存储:邻接表和邻接矩阵. 存图有两种方式,邻接矩阵严格说就是一个bool型的二维数组,map[i][j]表 ...
- 【Neural Network】林轩田机器学习技法
首先从单层神经网络开始介绍 最简单的单层神经网络可以看成是多个Perception的线性组合,这种简单的组合可以达到一些复杂的boundary. 比如,最简单的逻辑运算AND OR NOT都可以由多 ...
- loadrunner检查点设置失败,日志中SaveCount无法被正常统计出来
在脚本正确的情况下的web_reg_find检查点检查失败,SaveCount无法被正常统计出来. 在检查项Text为中文的情况下, ******(我是被录制下来的代码) web_reg_find(& ...
- [shell]查找网段内可用IP地址
#网段可用IP地址 #!/bin/sh ip= " ]; do .$ip -c |grep -q "ttl=" && echo "10.86.8 ...
- 孤荷凌寒自学python第三十九天python 的线程锁Lock
孤荷凌寒自学python第三十九天python的线程锁Lock (完整学习过程屏幕记录视频地址在文末,手写笔记在文末) 当多个线程同时操作一个文件等需要同时操作某一对象的情况发生时,很有可能发生冲突, ...
- Python-有名匿名函数、列表推导式
介绍: 匿名函数: 匿名函数用lambda关键词能创建小型匿名函数.这种函数得名于省略了用def声明函数的标准步骤,节省开辟空间. 列表推导式: 有名函数 #1.有名函数(初始) def squ ...
- HDU 3887 Counting Offspring (树状数组+人工模拟栈)
对这棵树DFS遍历一遍,同一节点入栈和出栈之间访问的节点就是这个节点的子树. 因此节点入栈时求一次 小于 i 的节点个数 和,出栈时求一次 小于 i 的节点个数 和,两次之差就是答案. PS.这题直接 ...
- String类型的方法使用
String.equals()方法源代码: public boolean equals(Object anObject) { if (this == anObject) { return true; ...
- 一些需要注意的ts
写了一段时间ts,在从头学习一遍,温故而之新 ts的一些技巧 1.巧用注释 通过/** */形式的注释可以给 TS 类型做标记,编辑器会有更好的提示: /** A cool guy. */ inter ...
- 【bzoj4822/bzoj1935】[Cqoi2017]老C的任务/[Shoi2007]Tree 园丁的烦恼 树状数组
原文地址:http://www.cnblogs.com/GXZlegend/p/6825530.html bzoj4822 题目描述 老 C 是个程序员. 最近老 C 从老板那里接到了一个任务 ...