Z-score

import numpy as np

def outliers_z_score(ys):
threshold = 3 mean_y = np.mean(ys)
stdev_y = np.std(ys)
z_scores = [(y - mean_y) / stdev_y for y in ys]
return np.where(np.abs(z_scores) > threshold)

Modified Z-score

import numpy as np

def outliers_modified_z_score(ys):
threshold = 3.5 median_y = np.median(ys)
median_absolute_deviation_y = np.median([np.abs(y - median_y) for y in ys])
modified_z_scores = [0.6745 * (y - median_y) / median_absolute_deviation_y
for y in ys]
return np.where(np.abs(modified_z_scores) > threshold)

IQR(interquartile range)

import numpy as np

def outliers_iqr(ys):
quartile_1, quartile_3 = np.percentile(ys, [25, 75])
iqr = quartile_3 - quartile_1
lower_bound = quartile_1 - (iqr * 1.5)
upper_bound = quartile_3 + (iqr * 1.5)
return np.where((ys > upper_bound) | (ys < lower_bound))

Conclusion

It is important to reiterate that these methods should not be used mechanically.
They should be used to explore the data – they let you know which points might be worth a closer look.
What to do with this information depends heavily on the situation.
Sometimes it is appropriate to exclude outliers from a dataset to make a model trained on that dataset more predictive.
Sometimes, however,
the presence of outliers is a warning sign that the real-world process generating the data is more complicated than expected. As an astute commenter on CrossValidated put it:
“Sometimes outliers are bad data, and should be excluded, such as typos.
Sometimes they are Wayne Gretzky or Michael Jordan, and should be kept.” Domain knowledge and practical wisdom are the only ways to tell the difference.

  

摘自:http://colingorrie.github.io/outlier-detection.html

Three ways to detect outliers的更多相关文章

  1. MATLAB remove outliers.

    Answer by Richard Willey on 9 Jan 2012 Hi Michael MATLAB doesn't provide a specific function to remo ...

  2. 异常值处理outlier

    python信用评分卡(附代码,博主录制) https://study.163.com/course/introduction.htm?courseId=1005214003&utm_camp ...

  3. 深度学习图像配准 Image Registration: From SIFT to Deep Learning

    Image Registration is a fundamental step in Computer Vision. In this article, we present OpenCV feat ...

  4. (转)利用libcurl和国内著名的两个物联网云端通讯的例程, ubuntu和openwrt下调试成功(四)

    1. libcurl 的参考文档如下 CURLOPT_HEADERFUNCTION Pass a pointer to a function that matches the following pr ...

  5. 如何查找物理cpu,cpu核心和逻辑cpu的数量

    环境 Red Hat Enterprise Linux 4 Red Hat Enterprise Linux 5 Red Hat Enterprise Linux 6 Red Hat Enterpri ...

  6. Gartner 2018 年WAF魔力象限报告:云WAF持续增长,Bot管理与API安全拥有未来

    Gartner 2018 年WAF魔力象限报告:云WAF持续增长,Bot管理与API安全拥有未来 来源 https://www.freebuf.com/articles/paper/184903.ht ...

  7. A Gentle Guide to Machine Learning

    A Gentle Guide to Machine Learning Machine Learning is a subfield within Artificial Intelligence tha ...

  8. [linux]linuxI/O测试的方法之dd

    参考http://www.thomas-krenn.com/en/wiki/Linux_I/O_Performance_Tests_using_dd Measuring Write Performan ...

  9. WGCNA构建基因共表达网络详细教程

    这篇文章更多的是对于混乱的中文资源的梳理,并补充了一些没有提到的重要参数,希望大家不会踩坑. 1. 简介 1.1 背景 WGCNA(weighted gene co-expression networ ...

随机推荐

  1. python接口自动化-get请求

    一.环境安装 1.用pip安装requests模块 >>pip install requests 二.get请求 1.  url 1.1:   response 的返回内容还有很多信息,例 ...

  2. 网络安全实验室--SQL注入关

    第一关 万能密码:username='or '1'='1'#    password=1    即可登录得到flag. 第二关 最基础的注入,order by 判断字段数,然后 union selec ...

  3. 《通过C#学Proto.Actor模型》之Persistence

    Actor是有状态的,当每一步执行失败后,返回失败地方继续执行时,希望此时的状态是正确的,为了保证这一点,持久化就成了必要的环节了. Proto.Actor提供了三种方式执久化: Event Sour ...

  4. Spring Security Oauth2 的配置

    使用oauth2保护你的应用,可以分为简易的分为三个步骤 配置资源服务器 配置认证服务器 配置spring security 前两点是oauth2的主体内容,但前面我已经描述过了,spring sec ...

  5. 网站添加icon

    设置网站的icon<link rel="shortcut icon" href="./static/img/favicon.ico" >

  6. [Alpha阶段]第四次Scrum Meeting

    Scrum Meeting博客目录 [Alpha阶段]第四次Scrum Meeting 基本信息 名称 时间 地点 时长 第四次Scrum Meeting 19/04/08 大运村寝室6楼 50min ...

  7. 关于PHP自动捕捉处理错误和异常的尝试

    之所以想着做错误和异常的自动处理是因为: 用的公司自己的框架写API,没有异常和错误相关功能, 而每次操作都进行try...catch,有点繁琐不说,感觉还很鸡肋,即使我catch到了,还是得写代码进 ...

  8. php函数 array_chunk

    array_chunk ( array $array , int $size [, bool $preserve_keys = false ] ) : array 将一个数组分割成多个数组,其中每个数 ...

  9. [转帖]Oracle 12cR2使用经验

    大规模升级来临,谈谈Oracle 12cR2使用经验 随着2019年2月13日,Oracle 19c (Oracle 12.2.0.3) for Exadata 版本发布,Oracle 12cR2体系 ...

  10. vue.js实战——购物车练习(包含全选功能)

    vue.js实战第5章 54页的练习1 直接放代码好了,全选的部分搞了好久,代码好像有点啰嗦,好在实现功能了(*^▽^*) HTML: <!DOCTYPE html> <html l ...