Three ways to detect outliers
Z-score
import numpy as np def outliers_z_score(ys):
threshold = 3 mean_y = np.mean(ys)
stdev_y = np.std(ys)
z_scores = [(y - mean_y) / stdev_y for y in ys]
return np.where(np.abs(z_scores) > threshold)
Modified Z-score
import numpy as np def outliers_modified_z_score(ys):
threshold = 3.5 median_y = np.median(ys)
median_absolute_deviation_y = np.median([np.abs(y - median_y) for y in ys])
modified_z_scores = [0.6745 * (y - median_y) / median_absolute_deviation_y
for y in ys]
return np.where(np.abs(modified_z_scores) > threshold)
IQR(interquartile range)
import numpy as np def outliers_iqr(ys):
quartile_1, quartile_3 = np.percentile(ys, [25, 75])
iqr = quartile_3 - quartile_1
lower_bound = quartile_1 - (iqr * 1.5)
upper_bound = quartile_3 + (iqr * 1.5)
return np.where((ys > upper_bound) | (ys < lower_bound))
Conclusion
It is important to reiterate that these methods should not be used mechanically.
They should be used to explore the data – they let you know which points might be worth a closer look.
What to do with this information depends heavily on the situation.
Sometimes it is appropriate to exclude outliers from a dataset to make a model trained on that dataset more predictive.
Sometimes, however,
the presence of outliers is a warning sign that the real-world process generating the data is more complicated than expected. As an astute commenter on CrossValidated put it:
“Sometimes outliers are bad data, and should be excluded, such as typos.
Sometimes they are Wayne Gretzky or Michael Jordan, and should be kept.” Domain knowledge and practical wisdom are the only ways to tell the difference.
摘自:http://colingorrie.github.io/outlier-detection.html
Three ways to detect outliers的更多相关文章
- MATLAB remove outliers.
Answer by Richard Willey on 9 Jan 2012 Hi Michael MATLAB doesn't provide a specific function to remo ...
- 异常值处理outlier
python信用评分卡(附代码,博主录制) https://study.163.com/course/introduction.htm?courseId=1005214003&utm_camp ...
- 深度学习图像配准 Image Registration: From SIFT to Deep Learning
Image Registration is a fundamental step in Computer Vision. In this article, we present OpenCV feat ...
- (转)利用libcurl和国内著名的两个物联网云端通讯的例程, ubuntu和openwrt下调试成功(四)
1. libcurl 的参考文档如下 CURLOPT_HEADERFUNCTION Pass a pointer to a function that matches the following pr ...
- 如何查找物理cpu,cpu核心和逻辑cpu的数量
环境 Red Hat Enterprise Linux 4 Red Hat Enterprise Linux 5 Red Hat Enterprise Linux 6 Red Hat Enterpri ...
- Gartner 2018 年WAF魔力象限报告:云WAF持续增长,Bot管理与API安全拥有未来
Gartner 2018 年WAF魔力象限报告:云WAF持续增长,Bot管理与API安全拥有未来 来源 https://www.freebuf.com/articles/paper/184903.ht ...
- A Gentle Guide to Machine Learning
A Gentle Guide to Machine Learning Machine Learning is a subfield within Artificial Intelligence tha ...
- [linux]linuxI/O测试的方法之dd
参考http://www.thomas-krenn.com/en/wiki/Linux_I/O_Performance_Tests_using_dd Measuring Write Performan ...
- WGCNA构建基因共表达网络详细教程
这篇文章更多的是对于混乱的中文资源的梳理,并补充了一些没有提到的重要参数,希望大家不会踩坑. 1. 简介 1.1 背景 WGCNA(weighted gene co-expression networ ...
随机推荐
- MySQL 索引创建及使用
索引的类型 PRIMARY KEY(主键索引): 用来标识唯一性,数据不可重复 ,主键列不能为NULL,并且每个表中有且只能有一个主键,还可以创建复合主键,即多个字段组合起来. 创建语句为: -- ...
- 调用远程主机上的 RMI 服务时抛出 java.rmi.ConnectException: Connection refused to host: 127.0.0.1 异常原因及解决方案
最近使用 jmx 遇到一个问题,client/server 同在一台机器上,jmx client能够成功连接 server,如果把 server 移植到另一台机器上192.168.134.128,抛出 ...
- 合并两个有序链表的golang实现
将两个有序链表合并为一个新的有序链表并返回.新链表是通过拼接给定的两个链表的所有节点组成的. 输入:->->, ->-> 输出:->->->->-> ...
- Logging 日志配置格式模板
import osBASE_DIR = os.path.dirname(os.path.dirname(__file__))DB_PATH = os.path.join(BASE_DIR, 'db') ...
- WPF动态模板选择的两种实现
前言 .net开发工作了六年,看了大量的博客,现在想开始自己写博客,这是我的第一篇博客,试试水,就从自己最常使用的WPF开始. 今天我来给大家分享可用户动态选择控件模板的两种实现方式:DataTrig ...
- 追逐心目中的那个Ta
申明:全篇皆为作者臆想,浪漫主义代表派作品,若有雷同,纯属巧合 人生最难过的不就是在一无所有的年纪里遇到了最想呵护一生的人,而在拥有一切的时候却失去了不顾一切的心. 长夜漫漫,本是相思人,偏听多情曲, ...
- python 枚举Enum
常量是任何一门语言中都会使用的一种变量类型 如 要表示星期常量,我们可能会直接定义一组变量 JAN = 1 TWO = 2 ... 然后在返回给前端的时候,我们返回的就会是1,2,...这种魔法数字, ...
- git 版本管控 发布
https://www.cnblogs.com/charlesblc/p/6051569.html http://www.ruanyifeng.com/blog/2012/07/git.html 1. ...
- vue组件之前嵌套
https://www.cnblogs.com/chengduxiaoc/p/7099552.html <!DOCTYPE html> <html lang="en&quo ...
- Java 面向对象的基本特征
前言: 在刚开始接触Java的时候,那时候面对Java面向对象的几大特征一直理解的不是很理解,借着空闲时间在这里整理一下,同时在加深一下印象. 一.封装: Java面向对象的特征之封装,所谓的封装就 ...