Z-score

import numpy as np

def outliers_z_score(ys):
threshold = 3 mean_y = np.mean(ys)
stdev_y = np.std(ys)
z_scores = [(y - mean_y) / stdev_y for y in ys]
return np.where(np.abs(z_scores) > threshold)

Modified Z-score

import numpy as np

def outliers_modified_z_score(ys):
threshold = 3.5 median_y = np.median(ys)
median_absolute_deviation_y = np.median([np.abs(y - median_y) for y in ys])
modified_z_scores = [0.6745 * (y - median_y) / median_absolute_deviation_y
for y in ys]
return np.where(np.abs(modified_z_scores) > threshold)

IQR(interquartile range)

import numpy as np

def outliers_iqr(ys):
quartile_1, quartile_3 = np.percentile(ys, [25, 75])
iqr = quartile_3 - quartile_1
lower_bound = quartile_1 - (iqr * 1.5)
upper_bound = quartile_3 + (iqr * 1.5)
return np.where((ys > upper_bound) | (ys < lower_bound))

Conclusion

It is important to reiterate that these methods should not be used mechanically.
They should be used to explore the data – they let you know which points might be worth a closer look.
What to do with this information depends heavily on the situation.
Sometimes it is appropriate to exclude outliers from a dataset to make a model trained on that dataset more predictive.
Sometimes, however,
the presence of outliers is a warning sign that the real-world process generating the data is more complicated than expected. As an astute commenter on CrossValidated put it:
“Sometimes outliers are bad data, and should be excluded, such as typos.
Sometimes they are Wayne Gretzky or Michael Jordan, and should be kept.” Domain knowledge and practical wisdom are the only ways to tell the difference.

  

摘自:http://colingorrie.github.io/outlier-detection.html

Three ways to detect outliers的更多相关文章

  1. MATLAB remove outliers.

    Answer by Richard Willey on 9 Jan 2012 Hi Michael MATLAB doesn't provide a specific function to remo ...

  2. 异常值处理outlier

    python信用评分卡(附代码,博主录制) https://study.163.com/course/introduction.htm?courseId=1005214003&utm_camp ...

  3. 深度学习图像配准 Image Registration: From SIFT to Deep Learning

    Image Registration is a fundamental step in Computer Vision. In this article, we present OpenCV feat ...

  4. (转)利用libcurl和国内著名的两个物联网云端通讯的例程, ubuntu和openwrt下调试成功(四)

    1. libcurl 的参考文档如下 CURLOPT_HEADERFUNCTION Pass a pointer to a function that matches the following pr ...

  5. 如何查找物理cpu,cpu核心和逻辑cpu的数量

    环境 Red Hat Enterprise Linux 4 Red Hat Enterprise Linux 5 Red Hat Enterprise Linux 6 Red Hat Enterpri ...

  6. Gartner 2018 年WAF魔力象限报告:云WAF持续增长,Bot管理与API安全拥有未来

    Gartner 2018 年WAF魔力象限报告:云WAF持续增长,Bot管理与API安全拥有未来 来源 https://www.freebuf.com/articles/paper/184903.ht ...

  7. A Gentle Guide to Machine Learning

    A Gentle Guide to Machine Learning Machine Learning is a subfield within Artificial Intelligence tha ...

  8. [linux]linuxI/O测试的方法之dd

    参考http://www.thomas-krenn.com/en/wiki/Linux_I/O_Performance_Tests_using_dd Measuring Write Performan ...

  9. WGCNA构建基因共表达网络详细教程

    这篇文章更多的是对于混乱的中文资源的梳理,并补充了一些没有提到的重要参数,希望大家不会踩坑. 1. 简介 1.1 背景 WGCNA(weighted gene co-expression networ ...

随机推荐

  1. SQLServr添加数据列

    数据列定义 表中数据行的数据插入和数据类型都是基于数据列的,学会添加数据列在开发过程中是必不可少的. 使用SSMS数据库管理工具添加数据列 在数据表中添加一列或者多列步骤相同 1.连接数据库,选择数据 ...

  2. Thread中yield方法

    先上一段代码 public class YieldExcemple { public static void main(String[] args) { Thread threada = new Th ...

  3. RabbitMQ持久化

    我们知道,如果消息接收端挂了,消息会保存在队列里.下次接收端启动就会接收到消息. 如果RabbitMQ挂了怎么办呢?这时候需要将消息持久化到硬盘 消息发送端:producer ........... ...

  4. Linux删除文件夹和修改文件名

    rm [选项] 文件 -f, --force 强力删除,不要求确认 -i 每删除一个文件或进入一个子目录都要求确认 -I 在删除超过三个文件或者递归删除前要求确认 -r, -R 递归删除子目录 -d, ...

  5. Centos7.x做开机启动脚本

    cat /etc/centos-release CentOS Linux release 7.4.1708 (Core) uname -r 3.10.0-693.11.1.el7.x86_64 vim ...

  6. C# — 调用dll出现试图加载不正确格式的程序问题

    今天在调用百度dll包时,运行项目出现了如下警告: 修改:鼠标右击项目名称----选择属性----生成-----平台目标-----X64(由于我调用的是X64的dll包,所以这里选择X64,网上许多说 ...

  7. Filebeat配置参考手册

    Filebeat的配置参考 指定要运行的模块 前提: 在运行Filebeat模块之前,需要安装并配置Elastic堆栈: 安装Ingest Node GeoIP和User Agent插件.这些插件需要 ...

  8. BZOJ4025 二分图 线段树分治、带权并查集

    传送门 如果边不会消失,那么显然可以带权并查集做(然后发现自己不会写带权并查集) 但是每条边有消失时间.这样每一条边产生贡献的时间对应一段区间,故对时间轴建立线段树,将每一条边扔到线段树对应的点上. ...

  9. Asp.Net Core SignalR 用泛型Hub优雅的调用前端方法及传参

    继续学习 最近一直在使用Asp.Net Core SignalR(下面成SignalR Core)为小程序提供websocket支持,前端时间也发了一个学习笔记,在使用过程中稍微看了下它的源码,不得不 ...

  10. Ubuntu下解压压缩文件

    1.ZIP解压    ZIP因为它的跨平台使用优点,是目前使用率最高的一种压缩方式,但是它的压缩率相比较tar.gz和tar.gz2来讲,却要低很多.    压缩命令:zip -r archive_n ...