Answer by Richard Willey on 9 Jan 2012

Hi Michael

MATLAB doesn't provide a specific function to remove outliers. In general you have a couple different options to deal with outliers.

1. You can create an index that flags potential outliers and either delete them from your data set or substitute more plausible values

2. You can use robust techniques like robust regression which are less sensitive to the presence of outliers.

Your choice of strategies will depend a lot on your knowledge about the data set. For example, if you have a lot of data points that are coded with a value like -9999 these are probably error codes of some kind rather than actual numeric information.

I'm including some simple example code which shows a standard technique to detect outliers.

=====================
% Create a vector of X values
clear all
clc
hold off
X = 1:100;
X = X';
% Create a noise vector
noise = randn(100,1);
% Create a second noise value where sigma is much larger
noise2 = 10*randn(100,1);
% Substitute noise2 for noise1 at obs# (11, 31, 51, 71, 91)
% Many of these points will have an undue influence on the model
noise(11:20:91) = noise2(11:20:91);
% Specify Y = F(X)
Y = 3*X + 2 + noise;
% Cook's Distance for a given data point measures the extent to
% which a regression model would change if this data point
% were excluded from the regression. Cook's Distance is
% sometimes used to suggest whether a given data point might be an outlier.
% Use regstats to calculate Cook's Distance
stats = regstats(Y,X,'linear');
% if Cook's Distance > n/4 is a typical treshold that is used to suggest
% the presence of an outlier
potential_outlier = stats.cookd > 4/length(X);
% Display the index of potential outliers and graph the results
X(potential_outlier)
scatter(X,Y, 'b.')
hold on
scatter(X(potential_outlier),Y(potential_outlier), 'r.')

MATLAB remove outliers.的更多相关文章

  1. matlab中的containers.Map()

    matlab中的containers.Map() 标签: matlabcontainers.Map容器map 2015-10-27 12:45 1517人阅读 评论(1) 收藏 举报  分类: Mat ...

  2. Taxi Trip Time Winners' Interview: 3rd place, BlueTaxi

    Taxi Trip Time Winners' Interview: 3rd place, BlueTaxi This spring, Kaggle hosted two competitions w ...

  3. 异常值处理outlier

    python信用评分卡(附代码,博主录制) https://study.163.com/course/introduction.htm?courseId=1005214003&utm_camp ...

  4. 壁虎书1 The Machine Learning Landscape

    属性与特征: attribute: e.g., 'Mileage' feature: an attribute plus its value, e.g., 'Mileage = 15000' Note ...

  5. 第三课 创建函数 - 从EXCEL读取 - 导出到EXCEL - 异常值 - Lambda函数 - 切片和骰子数据

    第 3 课   获取数据 - 我们的数据集将包含一个Excel文件,其中包含每天的客户数量.我们将学习如何对 excel 文件进​​行处理.准备数据 - 数据是有重复日期的不规则时间序列.我们将挑战数 ...

  6. Learning Spark中文版--第六章--Spark高级编程(2)

    Working on a Per-Partition Basis(基于分区的操作) 以每个分区为基础处理数据使我们可以避免为每个数据项重做配置工作.如打开数据库连接或者创建随机数生成器这样的操作,我们 ...

  7. Matlab的标记分水岭分割算法

    1 综述 Separating touching objects in an image is one of the more difficult image processing operation ...

  8. Matlab编程基础

    平台:Win7 64 bit,Matlab R2014a(8.3) “Matlab”是“Matrix Laboratory” 的缩写,中文“矩阵实验室”,是强大的数学工具.本文侧重于Matlab的编程 ...

  9. Matlab 进阶学习记录

    最近在看 Faster RCNN的Matlab code,发现很多matlab技巧,在此记录: 1. conf_proposal  =  proposal_config('image_means', ...

随机推荐

  1. OpenSSL再爆多处高危漏洞

    OpenSSL团队于北京时间6月5号晚8点左右发布了5个安全补丁,这次的更新涉及多处高危漏洞,连接:http://www.openssl.org/news/ 受影响的版本包括: OpenSSL 1.0 ...

  2. ASP.NET中MEMCACHED

    一,准备        你需要有一下软件:       VS.NET(05/08)       SQLSERVER       memcached服务器端以及客户端类库(开源软件,下载即可)其中,客户 ...

  3. Androidstudio下Generate signed apk提示Error: Expected resource of type id [ResourceType]解决办法

    只需要在报错位置所在的类上面添加: @SuppressWarnings("ResourceType") 即可实现Generate signed apk.

  4. JS全局变量VAR和THIS

    (注意)JS全局变量VAR和THIS 很多人都觉得在javascript声明一个变量,加var和不加var没有什么区别,实际上是一个错误的观点,如果在函数外面,也就是说在window区域加不加var确 ...

  5. Nginx - 指定log_format,常用于 Awstats 分析

    1. vim /etc/nginx/nginx.conf (下面格式, Awstats 使用) log_format new_log '$remote_addr - $remote_user [$ti ...

  6. 【转】VS2012编译出来的程序,在XP上运行,出现“.exe 不是有效的 win32 应用程序” “not a valid win32 application”

    原文网址:http://www.cnblogs.com/Dageking/archive/2013/05/15/3079394.html VS2012编译出来的程序,在XP上运行,出现“.exe 不是 ...

  7. 利用c#反射实现实体类生成以及数据获取与赋值

    转:http://hi.baidu.com/xyd21c/item/391da2fc8fb351c10dd1c8b8 原有的实体类成员逐个赋值与获取的方法弊端: 1.每次对实体类属性进行赋值时,都要检 ...

  8. SharePoint 2010 列表项事件接收器 ItemAdded 的使用方法

    列表项事件处理器是继承于Microsoft.SharePoint.SPItemEventReceiver的类,Microsoft.SharePoint.SPItemEventReceiver类提供了许 ...

  9. 【c++内存分布系列】单继承

    父类包括成员函数.静态函数.静态方法,子类包括成员函数.静态函数.静态方法的情况与一个类时完全一致,这里就不做分析了.子类单独包含虚函数时继承无关,也不做分析了. 一.父类子类都为空 #include ...

  10. Shell获取Aix/linux/unix机器上db2和os的信息并上传到指定服务器

    (之前写过一篇类似的文章,当时传输文件用的是ftp,因为项目觉得ftp不够安全所以这次换成了scp,同时对脚本的一些地方也做了一些调整) 其实做这个东西还是因为项目的需求,需要获取某些机器(目前主要是 ...