I learned A/B testing from a Youtube vedio. The link is https://www.youtube.com/watch?v=Bu7OqjYk0jM.

I will divide the note into two parts. The first part is generally an overview of hypothesis testing. Most concepts can be found in the article  "Statistics Basics: Main Concepts in Hypothesis Testing" and I will focus on pratical applications here.

 

Actual

Predicted

T (H1) F (H0)
T (H1) TP FP (α)
F (H0) FN (β) TN

  P = TP/(TP+FN)

  R = 1-β =TP/(TP+FN)

Python Example

Case #1: Alternate hypothesis is true

n = 100
p1 = 0.4
p2 = 0.6 # Compute distributions
x = np.arange(0,n+1)
pmf1 = stats.binom.pmf(x,n,p1)
pmf2 = stats.binom.pmf(x,n,p2)
plot(x,pmf1,pmf2)

We can find that the distributions between Coin 1 and Coin 2 are different. We check different values of m1 and m2.

# Example outcomes
m1, m2 = 40, 60
table = [[m1, n-m1], [m2, n-m2]]
chi2, pval, dof, expected = stats.chi2_contingency(table)
decision = 'reject H0' if pval<0.05 else 'accept H0'
print('{} ({})'.format(pval,decision))
0.007209570764742524 (reject H0)
# Example outcomes
m1, m2 = 43, 57
table = [[m1, n-m1], [m2, n-m2]]
chi2, pval, dof, expected = stats.chi2_contingency(table)
decision = 'reject H0' if pval<0.05 else 'accept H0'
print('{} ({})'.format(pval,decision))
0.06599205505934735 (accept H0)

In the secod example, m1 and m2 are not different significantly to reject H0.

Case #2: Null hypothesis is true

n = 100
p1 = 0.5
p2 = 0.5 # Compute distributions
x = np.arange(0,n+1)
pmf1 = stats.binom.pmf(x,n,p1)
pmf2 = stats.binom.pmf(x,n,p2)
plot(x,pmf1,pmf2)

In this case, two distributions overlap because we define the same value of p1 and p2.

# Example outcomes
m1, m2 = 49, 51
table = [[m1, n-m1], [m2, n-m2]]
chi2, pval, dof, expected = stats.chi2_contingency(table)
decision = 'reject H0' if pval<0.05 else 'accept H0'
print('{} ({})'.format(pval,decision))
0.887537083981715 (accept H0)

Actuall, we can only say that m1 and m2 are not different significantly to reject H0. It doesn't mean we should accept H0. The explanation is given in the previous article.

# Example outcomes
m1, m2 = 42, 58
table = [[m1, n-m1], [m2, n-m2]]
chi2, pval, dof, expected = stats.chi2_contingency(table)
decision = 'reject H0' if pval<0.05 else 'accept H0'
print('{} ({})'.format(pval,decision))
0.033894853524689295 (reject H0)

Firstly, calculate the sample size:

p1, p2 = 0.0500, 0.0515
alpha = 0.05
beta = 0.05 # Evaluate quantile function
p_bar = (p1+p2)/2.0
za = stats.norm.ppf(1-alpha/2) # Two-sided test
zb = stats.norm.ppf(1-beta) # Compute correction factor
A = (za*np.sqrt(2*p_bar*(1-p_bar))+ zb*np.sqrt(p1*(1-p1)+p2*(1-p2)))**2 #Estimate samples required
n = A*(((1+np.sqrt(1+4*(p1-p2)/A)))/(2*(p1-p2)))**2 print (n) # we need 2n users
555118.7638311392

So for test and control combined we'll need at least 2n = 1.1 million users. This is where this stuff gets hard and you're trying to measure something that doesn't happen often which is usally the thing to care because if it's rare, it's usually valuable. When you're trying to change it, you usually can't change it much because if you can change it a lot then your business would be easier usually it's harder to change the thing you care the most about. So in that case, it's like the hardest case where a/b testing the most values are unchange.

Next we can perform a/b testing:

n = 555119
n_trials = 10000 # Simulate experimental results when null is true
control0 = stats.binom.rvs (n,p1,size = n_trials)
test0 = stats.binom.rvs(n, p1, size = n_trials) # Test and control are the same
tables0 = [[[a, n-a], [b, n-b]] for a, b in zip(control0, test0)]
results0 = [stats.chi2_contingency(T) for T in tables0]
decisions0 = [x[1] <= alpha for x in results0] # Simulate experimental results when alternate is true
control1 = stats.binom.rvs (n,p1,size = n_trials)
test1 = stats.binom.rvs(n, p2, size = n_trials) # Test and control are the same
tables1 = [[[a, n-a], [b, n-b]] for a, b in zip(control1, test1)]
results1 = [stats.chi2_contingency(T) for T in tables1]
decisions1 = [x[1] <= alpha for x in results1] # Compute false alarm and correct detection rates
alpha_est = sum(decisions0)/float(n_trials)
power_est = sum(decisions1)/float(n_trials) print('Theoretical false alarm rate = {:0.4f}, '.format(alpha)+
'empirical false alarm rate = {:0.4f}'.format(alpha_est))
print('Theoretical power = {:0.4f}, '.format(1-beta)+
'empirical power = {:0.4f}'.format(power_est))
Theoretical false alarm rate = 0.0500, empirical false alarm rate = 0.0509
Theoretical power = 0.9500, empirical power = 0.9536

A/B Testing with Practice in Python (Part One)的更多相关文章

  1. A/B Testing with Practice in Python (Part Two)

    This is the second part of A/B testing notes, which contains the practical issues and alternatives o ...

  2. [Python + Unit Testing] Write Your First Python Unit Test with pytest

    In this lesson you will create a new project with a virtual environment and write your first unit te ...

  3. Testing shell commands from Python

    如何测试shell命令?最近,我遇到了一些情况,我想运行shell命令进行测试,Python称为万能胶水语言,一些自动化测试都可以完成,目前手头的工作都是用python完成的.但是无法从Python中 ...

  4. [The Basics of Hacking and Penetration Testing] Learn & Practice

    Remember to consturct your test environment. Kali Linux & Metasploitable2 & Windows XP

  5. Automation Testing - Best Practice(书写规范)

    Coding Standards Coding Standards are suggestions that will help us to write automation Scripts code ...

  6. Machine and Deep Learning with Python

    Machine and Deep Learning with Python Education Tutorials and courses Supervised learning superstiti ...

  7. python安装locustio报错error: invalid command 'bdist_wheel'的解决方法

    locust--scalable user load testing tool writen in Python(是用python写的.规模化.可扩展的测试性能的工具) 安装locustio需要的环境 ...

  8. Python框架、库以及软件资源汇总

    转自:http://developer.51cto.com/art/201507/483510.htm 很多来自世界各地的程序员不求回报的写代码为别人造轮子.贡献代码.开发框架.开放源代码使得分散在世 ...

  9. Awesome Python

    Awesome Python  A curated list of awesome Python frameworks, libraries, software and resources. Insp ...

随机推荐

  1. loadrunner 欺骗ip设置

    工具准备:loadrunner12,windows 10 ip欺骗=ip wizard 前提条件:本机IP地址为固定地址,不是自动获取的地址 方法: 1.管理员身份打开cmd 2.输入命令:confi ...

  2. day-python入门3

    本节内容 鸡汤.电影 IDE介绍 知识回顾 数据类型 For循环 while循环 列表及常用操作 IDE介绍   IDE即集成开发环境        常见IDE   Visualstudio  : w ...

  3. 解决Navicat for MySQL 连接 Mysql 8.0.11 出现1251- Client does not support authentication protocol 错误

    安装MySQL8.0之后,使用Navicat连接数据库,报1251错误. 上网搜索解决方案,网上说出现这种情况的原因是:mysql8 之前的版本中加密规则是mysql_native_password, ...

  4. NOIP2018 集训(一)

    A题 Simple 时间限制:1000ms | 空间限制:256MB 问题描述 对于给定正整数\(n,m\),我们称正整数\(c\)为好的,当且仅当存在非负整数\(x,y\)使得\(n×x+m×y=c ...

  5. 抓取HTML网页数据

    (转)htmlparse filter使用 该类并不是一个通用的工具类,需要按自己的要求实现,这里只记录了Htmlparse.jar包的一些用法.仅此而已! 详细看这里:http://gundumw1 ...

  6. poj 2406 Power Strings (后缀数组 || KMP)

    Power Strings Time Limit: 3000MS   Memory Limit: 65536K Total Submissions: 28859   Accepted: 12045 D ...

  7. 2016年NK冬季训练赛 民间题解

    A题 水题,考察对行的读入和处理,注意使用long long #include <iostream> #include <cstring> #include <cstdi ...

  8. Tomcat学习笔记(一)

    Tomcat目录结构的认识 tomcat是Apache旗下的一个开源Servlet的容器,实现了对Servlet和JSP技术支持. 通过http://tomcat.apache.org/ 下载tomc ...

  9. hashCode()方法和equals方法的重要性。

    在Object中有两个重要的方法:hashCode()和equals(Object obj)方法,并且当你按ctrl+alt+s时会有Generator hashCode()和equals().我们不 ...

  10. Pandas之DataFrame——Part 1

    ''' [课程2.] Pandas数据结构Dataframe:基本概念及创建 "二维数组"Dataframe:是一个表格型的数据结构,包含一组有序的列,其列的值类型可以是数值.字符 ...