Answer by Richard Willey on 9 Jan 2012

Hi Michael

MATLAB doesn't provide a specific function to remove outliers. In general you have a couple different options to deal with outliers.

1. You can create an index that flags potential outliers and either delete them from your data set or substitute more plausible values

2. You can use robust techniques like robust regression which are less sensitive to the presence of outliers.

Your choice of strategies will depend a lot on your knowledge about the data set. For example, if you have a lot of data points that are coded with a value like -9999 these are probably error codes of some kind rather than actual numeric information.

I'm including some simple example code which shows a standard technique to detect outliers.

=====================
% Create a vector of X values
clear all
clc
hold off
X = 1:100;
X = X';
% Create a noise vector
noise = randn(100,1);
% Create a second noise value where sigma is much larger
noise2 = 10*randn(100,1);
% Substitute noise2 for noise1 at obs# (11, 31, 51, 71, 91)
% Many of these points will have an undue influence on the model
noise(11:20:91) = noise2(11:20:91);
% Specify Y = F(X)
Y = 3*X + 2 + noise;
% Cook's Distance for a given data point measures the extent to
% which a regression model would change if this data point
% were excluded from the regression. Cook's Distance is
% sometimes used to suggest whether a given data point might be an outlier.
% Use regstats to calculate Cook's Distance
stats = regstats(Y,X,'linear');
% if Cook's Distance > n/4 is a typical treshold that is used to suggest
% the presence of an outlier
potential_outlier = stats.cookd > 4/length(X);
% Display the index of potential outliers and graph the results
X(potential_outlier)
scatter(X,Y, 'b.')
hold on
scatter(X(potential_outlier),Y(potential_outlier), 'r.')

MATLAB remove outliers.的更多相关文章

  1. matlab中的containers.Map()

    matlab中的containers.Map() 标签: matlabcontainers.Map容器map 2015-10-27 12:45 1517人阅读 评论(1) 收藏 举报  分类: Mat ...

  2. Taxi Trip Time Winners' Interview: 3rd place, BlueTaxi

    Taxi Trip Time Winners' Interview: 3rd place, BlueTaxi This spring, Kaggle hosted two competitions w ...

  3. 异常值处理outlier

    python信用评分卡(附代码,博主录制) https://study.163.com/course/introduction.htm?courseId=1005214003&utm_camp ...

  4. 壁虎书1 The Machine Learning Landscape

    属性与特征: attribute: e.g., 'Mileage' feature: an attribute plus its value, e.g., 'Mileage = 15000' Note ...

  5. 第三课 创建函数 - 从EXCEL读取 - 导出到EXCEL - 异常值 - Lambda函数 - 切片和骰子数据

    第 3 课   获取数据 - 我们的数据集将包含一个Excel文件,其中包含每天的客户数量.我们将学习如何对 excel 文件进​​行处理.准备数据 - 数据是有重复日期的不规则时间序列.我们将挑战数 ...

  6. Learning Spark中文版--第六章--Spark高级编程(2)

    Working on a Per-Partition Basis(基于分区的操作) 以每个分区为基础处理数据使我们可以避免为每个数据项重做配置工作.如打开数据库连接或者创建随机数生成器这样的操作,我们 ...

  7. Matlab的标记分水岭分割算法

    1 综述 Separating touching objects in an image is one of the more difficult image processing operation ...

  8. Matlab编程基础

    平台:Win7 64 bit,Matlab R2014a(8.3) “Matlab”是“Matrix Laboratory” 的缩写,中文“矩阵实验室”,是强大的数学工具.本文侧重于Matlab的编程 ...

  9. Matlab 进阶学习记录

    最近在看 Faster RCNN的Matlab code,发现很多matlab技巧,在此记录: 1. conf_proposal  =  proposal_config('image_means', ...

随机推荐

  1. LA 2402 (枚举) Fishnet

    题意: 正方形四个边界上分别有n个点,将其划分为(n+1)2个四边形,求四边形面积的最大值. 分析: 因为n的规模很小,所以可以二重循环枚举求最大值. 求直线(a, 0) (b, 0) 和直线(0, ...

  2. REVOKE DBA权限要小心

      REVOKE DBA权限要小心 转载:http://blog.csdn.net/lwei_998/article/details/6133557 发现某些用户有DBA权限的时候,为了安全,一般我们 ...

  3. android中的ellipsize设置(省略号的问题)

    textview中有个内容过长加省略号的属性,即ellipsize,可以较偷懒地解决这个问题,哈哈~ 用法如下: 在xml中 android:ellipsize = "end"   ...

  4. ios8.3 编译 arm64版 openssl-1.0.2a

    xcode是6.3版的,ios sdk 是8.3的, 到http://www.openssl.org/source/下载最新版本openssl-1.0.2a 解压后用文本编辑器打开configure文 ...

  5. linux 定时任务调度Cron的用法详解

    在linux中,推荐使用crontab -e命令添加自定义的任务,退出后重启crond进程. 重新启动cron服务或重新加载cron配置,命令: 复制代码代码示例: /etc/rc.d/init.d/ ...

  6. NALU(NAL单元)

    一 NALU类型    标识NAL单元中的RBSP数据类型,其中,nal_unit_type为1, 2, 3, 4, 5及12的NAL单元称为VCL的NAL单元,其他类型的NAL单元为非VCL的NAL ...

  7. [转]Android调用so文件(C代码库)方法详解

    一.为什么调用c的dll要用源码编译成so库 Android系统是基于linux内核的移动终端系统,而dll是在windows环境下生成和调用的c库,所以不可以直接为android系统调用. 二.安装 ...

  8. linux编程获取本机网络相关参数

    getifaddrs()和struct ifaddrs的使用,获取本机IP 博客分类: Linux C编程   ifaddrs结构体定义如下: struct ifaddrs { struct ifad ...

  9. Pacman主题下给Hexo增加简历类型

    原文 http://blog.zanlabs.com/2015/01/02/add-resume-type-to-hexo-under-pacman-theme/ 背景 虽然暂时不找工作,但是想着简历 ...

  10. linux-LINUX试题

    ylbtech-doc:linux-LINUX试题 LINUX试题 1.A,LINUX试题返回顶部 01.{Linux题目}在使用匿名登录ftp时,用户名为(  )? (选择1项) A) login ...