I learned A/B testing from a Youtube vedio. The link is https://www.youtube.com/watch?v=Bu7OqjYk0jM.

I will divide the note into two parts. The first part is generally an overview of hypothesis testing. Most concepts can be found in the article "Statistics Basics: Main Concepts in Hypothesis Testing" and I will focus on pratical applications here.

Actual

Predicted

T (H₁)

F (H₀)

T (H₁)

FP (α)

F (H₀)

FN (β)

　　P = TP/(TP+FN)

　　R = 1-β =TP/(TP+FN)

Python Example

Case #1: Alternate hypothesis is true

n = 100

p1 = 0.4

p2 = 0.6

# Compute distributions

x = np.arange(0,n+1)

pmf1 = stats.binom.pmf(x,n,p1)

pmf2 = stats.binom.pmf(x,n,p2)

plot(x,pmf1,pmf2)

We can find that the distributions between Coin 1 and Coin 2 are different. We check different values of m1 and m2.

# Example outcomes

m1, m2 = 40, 60

table = [[m1, n-m1], [m2, n-m2]]

chi2, pval, dof, expected = stats.chi2_contingency(table)

decision = 'reject H0' if pval<0.05 else 'accept H0'

print('{} ({})'.format(pval,decision))

0.007209570764742524 (reject H0)

# Example outcomes

m1, m2 = 43, 57

table = [[m1, n-m1], [m2, n-m2]]

chi2, pval, dof, expected = stats.chi2_contingency(table)

decision = 'reject H0' if pval<0.05 else 'accept H0'

print('{} ({})'.format(pval,decision))

0.06599205505934735 (accept H0)

In the secod example, m1 and m2 are not different significantly to reject H0.

Case #2: Null hypothesis is true

n = 100

p1 = 0.5

p2 = 0.5

# Compute distributions

x = np.arange(0,n+1)

pmf1 = stats.binom.pmf(x,n,p1)

pmf2 = stats.binom.pmf(x,n,p2)

plot(x,pmf1,pmf2)

In this case, two distributions overlap because we define the same value of p1 and p2.

# Example outcomes

m1, m2 = 49, 51

table = [[m1, n-m1], [m2, n-m2]]

chi2, pval, dof, expected = stats.chi2_contingency(table)

decision = 'reject H0' if pval<0.05 else 'accept H0'

print('{} ({})'.format(pval,decision))

0.887537083981715 (accept H0)

Actuall, we can only say that m1 and m2 are not different significantly to reject H0. It doesn't mean we should accept H0. The explanation is given in the previous article.

# Example outcomes

m1, m2 = 42, 58

table = [[m1, n-m1], [m2, n-m2]]

chi2, pval, dof, expected = stats.chi2_contingency(table)

decision = 'reject H0' if pval<0.05 else 'accept H0'

print('{} ({})'.format(pval,decision))

0.033894853524689295 (reject H0)

Firstly, calculate the sample size:

p1, p2 = 0.0500, 0.0515

alpha = 0.05

beta = 0.05

# Evaluate quantile function

p_bar = (p1+p2)/2.0

za = stats.norm.ppf(1-alpha/2) # Two-sided test

zb = stats.norm.ppf(1-beta)

# Compute correction factor

A = (za*np.sqrt(2*p_bar*(1-p_bar))+ zb*np.sqrt(p1*(1-p1)+p2*(1-p2)))**2

#Estimate samples required

n = A*(((1+np.sqrt(1+4*(p1-p2)/A)))/(2*(p1-p2)))**2

print (n) # we need 2n users

555118.7638311392

So for test and control combined we'll need at least 2n = 1.1 million users. This is where this stuff gets hard and you're trying to measure something that doesn't happen often which is usally the thing to care because if it's rare, it's usually valuable. When you're trying to change it, you usually can't change it much because if you can change it a lot then your business would be easier usually it's harder to change the thing you care the most about. So in that case, it's like the hardest case where a/b testing the most values are unchange.

Next we can perform a/b testing:

n = 555119

n_trials = 10000

# Simulate experimental results when null is true

control0 = stats.binom.rvs (n,p1,size = n_trials)

test0 = stats.binom.rvs(n, p1, size = n_trials) # Test and control are the same

tables0 = [[[a, n-a], [b, n-b]] for a, b in zip(control0, test0)]

results0 = [stats.chi2_contingency(T) for T in tables0]

decisions0 = [x[1] <= alpha for x in results0]

# Simulate experimental results when alternate is true

control1 = stats.binom.rvs (n,p1,size = n_trials)

test1 = stats.binom.rvs(n, p2, size = n_trials) # Test and control are the same

tables1 = [[[a, n-a], [b, n-b]] for a, b in zip(control1, test1)]

results1 = [stats.chi2_contingency(T) for T in tables1]

decisions1 = [x[1] <= alpha for x in results1]

# Compute false alarm and correct detection rates

alpha_est = sum(decisions0)/float(n_trials)

power_est = sum(decisions1)/float(n_trials)

print('Theoretical false alarm rate = {:0.4f}, '.format(alpha)+

     'empirical false alarm rate = {:0.4f}'.format(alpha_est))

print('Theoretical power = {:0.4f}, '.format(1-beta)+

     'empirical power = {:0.4f}'.format(power_est))

Theoretical false alarm rate = 0.0500, empirical false alarm rate = 0.0509

Theoretical power = 0.9500, empirical power = 0.9536

A/B Testing with Practice in Python (Part One)的更多相关文章

A/B Testing with Practice in Python (Part Two)
This is the second part of A/B testing notes, which contains the practical issues and alternatives o ...
[Python + Unit Testing] Write Your First Python Unit Test with pytest
In this lesson you will create a new project with a virtual environment and write your first unit te ...
Testing shell commands from Python
如何测试shell命令?最近,我遇到了一些情况,我想运行shell命令进行测试,Python称为万能胶水语言,一些自动化测试都可以完成,目前手头的工作都是用python完成的.但是无法从Python中 ...
[The Basics of Hacking and Penetration Testing] Learn & Practice
Remember to consturct your test environment. Kali Linux & Metasploitable2 & Windows XP
Automation Testing - Best Practice（书写规范）
Coding Standards Coding Standards are suggestions that will help us to write automation Scripts code ...
Machine and Deep Learning with Python
Machine and Deep Learning with Python Education Tutorials and courses Supervised learning superstiti ...
python安装locustio报错error: invalid command 'bdist_wheel'的解决方法
locust--scalable user load testing tool writen in Python(是用python写的.规模化.可扩展的测试性能的工具) 安装locustio需要的环境 ...
Python框架、库以及软件资源汇总
转自:http://developer.51cto.com/art/201507/483510.htm 很多来自世界各地的程序员不求回报的写代码为别人造轮子.贡献代码.开发框架.开放源代码使得分散在世 ...
Awesome Python
Awesome Python A curated list of awesome Python frameworks, libraries, software and resources. Insp ...

随机推荐

《Cracking the Coding Interview》——第8章：面向对象设计——题目1
2014-04-23 17:32 题目:请设计一个数据结构来模拟一副牌,你要如何用这副牌玩21点呢? 解法:说实话,扑克牌的花样在于各种花色.顺子.连对.三带一.炸弹等等,如果能设计一个数据结构,让判 ...
天性 & 如水一般，遇强则强 —— 梦想、行为、性格
开篇声明,我博客中“小心情”这一系列,全都是日记啊随笔啊什么乱七八糟的.如果一不小心点进来了,不妨直接关掉.我自己曾经写过一段时间的日记,常常翻看,毫无疑问我的文笔是很差的,而且心情也是瞬息万变的.因 ...
Python SimpleHTTPServer简单HTTP服务器
搭建FTP,或者是搭建网络文件系统,这些方法都能够实现Linux的目录共享.但是FTP和网络文件系统的功能都过于强大,因此它们都有一些不够方便的地方.比如你想快速共享Linux系统的某个目录给整个项目 ...
css 之 border-radius属性
css中给盒子设置圆角可以通过 border-radius 属性来实现(具体原理就不深入探讨了); 在开发过程中都会遇到浏览器兼容问题,这问题其实也不难解决,无非就是加上私有前缀,在这里先忽略掉. ...
添加selenium对应的jar包至pom.xml
1.进入https://mvnrepository.com/artifact/org.seleniumhq.selenium/selenium-java,点开相应的版本 2.复制图中选中的代码,粘贴至 ...
Android数据储存之SharedPreferences总结
写在前面:本文是我参考李刚老师的<疯狂Android讲义>以及API所写的读书笔记,在此表示感谢,本人小白,如有错误敬请指教. SharedPreferences的使用背景: 有时候,应用 ...
js万年历
首先,注意: 1.延迟执行 window.setTimeout( , ) 里面的时间是以毫秒计算的 2.间隔执行 window.setInterval( , ...
通过设计表快速了解sql语句中字段的含义
打开Navicat-------> 选择数据库 ------->右键设计表------>查看下方注释
spring3创建RESTFul Web Service
spring 3支持创建RESTFul Web Service,使用起来非常简单.不外乎一个@ResponseBody的问题. 例如:后台controller: 做一个JSP页面,使用ajax获取数据 ...
poj 3311 floyd+dfs或状态压缩dp 两种方法
Hie with the Pie Time Limit: 2000MS Memory Limit: 65536K Total Submissions: 6436 Accepted: 3470 ...

A/B Testing with Practice in Python (Part One)

Python Example

Case #1: Alternate hypothesis is true

Case #2: Null hypothesis is true

A/B Testing with Practice in Python (Part One)的更多相关文章

随机推荐

热门专题