I learned A/B testing from a Youtube vedio. The link is https://www.youtube.com/watch?v=Bu7OqjYk0jM.

I will divide the note into two parts. The first part is generally an overview of hypothesis testing. Most concepts can be found in the article "Statistics Basics: Main Concepts in Hypothesis Testing" and I will focus on pratical applications here.

Actual

Predicted

T (H₁)

F (H₀)

T (H₁)

FP (α)

F (H₀)

FN (β)

　　P = TP/(TP+FN)

　　R = 1-β =TP/(TP+FN)

Python Example

Case #1: Alternate hypothesis is true

n = 100

p1 = 0.4

p2 = 0.6

# Compute distributions

x = np.arange(0,n+1)

pmf1 = stats.binom.pmf(x,n,p1)

pmf2 = stats.binom.pmf(x,n,p2)

plot(x,pmf1,pmf2)

We can find that the distributions between Coin 1 and Coin 2 are different. We check different values of m1 and m2.

# Example outcomes

m1, m2 = 40, 60

table = [[m1, n-m1], [m2, n-m2]]

chi2, pval, dof, expected = stats.chi2_contingency(table)

decision = 'reject H0' if pval<0.05 else 'accept H0'

print('{} ({})'.format(pval,decision))

0.007209570764742524 (reject H0)

# Example outcomes

m1, m2 = 43, 57

table = [[m1, n-m1], [m2, n-m2]]

chi2, pval, dof, expected = stats.chi2_contingency(table)

decision = 'reject H0' if pval<0.05 else 'accept H0'

print('{} ({})'.format(pval,decision))

0.06599205505934735 (accept H0)

In the secod example, m1 and m2 are not different significantly to reject H0.

Case #2: Null hypothesis is true

n = 100

p1 = 0.5

p2 = 0.5

# Compute distributions

x = np.arange(0,n+1)

pmf1 = stats.binom.pmf(x,n,p1)

pmf2 = stats.binom.pmf(x,n,p2)

plot(x,pmf1,pmf2)

In this case, two distributions overlap because we define the same value of p1 and p2.

# Example outcomes

m1, m2 = 49, 51

table = [[m1, n-m1], [m2, n-m2]]

chi2, pval, dof, expected = stats.chi2_contingency(table)

decision = 'reject H0' if pval<0.05 else 'accept H0'

print('{} ({})'.format(pval,decision))

0.887537083981715 (accept H0)

Actuall, we can only say that m1 and m2 are not different significantly to reject H0. It doesn't mean we should accept H0. The explanation is given in the previous article.

# Example outcomes

m1, m2 = 42, 58

table = [[m1, n-m1], [m2, n-m2]]

chi2, pval, dof, expected = stats.chi2_contingency(table)

decision = 'reject H0' if pval<0.05 else 'accept H0'

print('{} ({})'.format(pval,decision))

0.033894853524689295 (reject H0)

Firstly, calculate the sample size:

p1, p2 = 0.0500, 0.0515

alpha = 0.05

beta = 0.05

# Evaluate quantile function

p_bar = (p1+p2)/2.0

za = stats.norm.ppf(1-alpha/2) # Two-sided test

zb = stats.norm.ppf(1-beta)

# Compute correction factor

A = (za*np.sqrt(2*p_bar*(1-p_bar))+ zb*np.sqrt(p1*(1-p1)+p2*(1-p2)))**2

#Estimate samples required

n = A*(((1+np.sqrt(1+4*(p1-p2)/A)))/(2*(p1-p2)))**2

print (n) # we need 2n users

555118.7638311392

So for test and control combined we'll need at least 2n = 1.1 million users. This is where this stuff gets hard and you're trying to measure something that doesn't happen often which is usally the thing to care because if it's rare, it's usually valuable. When you're trying to change it, you usually can't change it much because if you can change it a lot then your business would be easier usually it's harder to change the thing you care the most about. So in that case, it's like the hardest case where a/b testing the most values are unchange.

Next we can perform a/b testing:

n = 555119

n_trials = 10000

# Simulate experimental results when null is true

control0 = stats.binom.rvs (n,p1,size = n_trials)

test0 = stats.binom.rvs(n, p1, size = n_trials) # Test and control are the same

tables0 = [[[a, n-a], [b, n-b]] for a, b in zip(control0, test0)]

results0 = [stats.chi2_contingency(T) for T in tables0]

decisions0 = [x[1] <= alpha for x in results0]

# Simulate experimental results when alternate is true

control1 = stats.binom.rvs (n,p1,size = n_trials)

test1 = stats.binom.rvs(n, p2, size = n_trials) # Test and control are the same

tables1 = [[[a, n-a], [b, n-b]] for a, b in zip(control1, test1)]

results1 = [stats.chi2_contingency(T) for T in tables1]

decisions1 = [x[1] <= alpha for x in results1]

# Compute false alarm and correct detection rates

alpha_est = sum(decisions0)/float(n_trials)

power_est = sum(decisions1)/float(n_trials)

print('Theoretical false alarm rate = {:0.4f}, '.format(alpha)+

     'empirical false alarm rate = {:0.4f}'.format(alpha_est))

print('Theoretical power = {:0.4f}, '.format(1-beta)+

     'empirical power = {:0.4f}'.format(power_est))

Theoretical false alarm rate = 0.0500, empirical false alarm rate = 0.0509

Theoretical power = 0.9500, empirical power = 0.9536

A/B Testing with Practice in Python (Part One)的更多相关文章

A/B Testing with Practice in Python (Part Two)
This is the second part of A/B testing notes, which contains the practical issues and alternatives o ...
[Python + Unit Testing] Write Your First Python Unit Test with pytest
In this lesson you will create a new project with a virtual environment and write your first unit te ...
Testing shell commands from Python
如何测试shell命令?最近,我遇到了一些情况,我想运行shell命令进行测试,Python称为万能胶水语言,一些自动化测试都可以完成,目前手头的工作都是用python完成的.但是无法从Python中 ...
[The Basics of Hacking and Penetration Testing] Learn & Practice
Remember to consturct your test environment. Kali Linux & Metasploitable2 & Windows XP
Automation Testing - Best Practice（书写规范）
Coding Standards Coding Standards are suggestions that will help us to write automation Scripts code ...
Machine and Deep Learning with Python
Machine and Deep Learning with Python Education Tutorials and courses Supervised learning superstiti ...
python安装locustio报错error: invalid command 'bdist_wheel'的解决方法
locust--scalable user load testing tool writen in Python(是用python写的.规模化.可扩展的测试性能的工具) 安装locustio需要的环境 ...
Python框架、库以及软件资源汇总
转自:http://developer.51cto.com/art/201507/483510.htm 很多来自世界各地的程序员不求回报的写代码为别人造轮子.贡献代码.开发框架.开放源代码使得分散在世 ...
Awesome Python
Awesome Python A curated list of awesome Python frameworks, libraries, software and resources. Insp ...

随机推荐

Python 列表、元组、字典及集合操作详解
一.列表列表是Python中最基本的数据结构,是最常用的Python数据类型,列表的数据项不需要具有相同的类型列表是一种有序的集合,可以随时添加和删除其中的元素列表的索引从0开始 1.创建列表 ...
Jmeter 场景设计
今天的业务场景是: 1.管理员登录后台---登录成功后添加一个某类型的产品---产品添加成功后,再为该产品添加10个排期. 2.管理员登录后台--登录成功后添加多个不同类型产品---产品全部添加完成后 ...
Canvas 剪切图片
/** * 剪切图像 */ function initDemo8(){ var canvas = document.getElementById("demo8"); if (!ca ...
易语言.开源（绝地求生多功能盒子）类似LOL盒子
下载地址:https://pan.baidu.com/s/1OXwCjGJODkcZVrCwVixu3Q 成品地址:https://pan.lanzou.com/i0rmdwj
NodeJS05
商品分类模块分类model const mongoose = require('mongoose') const schema = new mongoose.Schema({ name: { typ ...
ASP.NET 抓取网页内容
(转)ASP.NET 抓取网页内容 ASP.NET 抓取网页内容-文字 ASP.NET 中抓取网页内容是非常方便的,而其中更是解决了 ASP 中困扰我们的编码问题. 需要三个类:WebRequest. ...
Android记事本开发02
今天: 继续学习基础知识. 昨天: 学习了ADB工具的基本命令. Android项目的目录结构. AndroidManifest.xml Android应用程序的打包和安装遇到的问题: 无.
hadoop2.5.2学习及实践笔记（三）—— HDFS概念及体系结构
注:文中涉及的文件路径或配置文件中属性名称是针对hadoop2.X系列,相对于之前版本,可能有改动. 附: HDFS用户指南官方介绍: http://hadoop.apache.org/docs/r2 ...
ubuntu16.04 使用问题笔记
1.问题: 下列软件包有未满足的依赖关系: vim : 依赖: vim-common (= 2:7.4.826-1ubuntu1) 但是 2:7.4.1689-3ubuntu1 正要被安装 E: 无法 ...
IPV4的地址是如何分类的？网络号的范围分别是多少？
1. A类地址 (1)A类地址第1字节为网络地址,其它3个字节为主机地址. (2)A类地址范围:1.0.0.1—126.255.255.254 (3)A类地址中的私有地址和保留地址: ① 10.X.X ...

A/B Testing with Practice in Python (Part One)

Python Example

Case #1: Alternate hypothesis is true

Case #2: Null hypothesis is true

A/B Testing with Practice in Python (Part One)的更多相关文章

随机推荐

热门专题