MATLAB remove outliers.
Hi Michael
MATLAB doesn't provide a specific function to remove outliers. In general you have a couple different options to deal with outliers.
1. You can create an index that flags potential outliers and either delete them from your data set or substitute more plausible values
2. You can use robust techniques like robust regression which are less sensitive to the presence of outliers.
Your choice of strategies will depend a lot on your knowledge about the data set. For example, if you have a lot of data points that are coded with a value like -9999 these are probably error codes of some kind rather than actual numeric information.
I'm including some simple example code which shows a standard technique to detect outliers.
=====================
% Create a vector of X values
clear all
clc
hold off
X = 1:100;
X = X';
% Create a noise vector
noise = randn(100,1);
% Create a second noise value where sigma is much larger
noise2 = 10*randn(100,1);
% Substitute noise2 for noise1 at obs# (11, 31, 51, 71, 91)
% Many of these points will have an undue influence on the model
noise(11:20:91) = noise2(11:20:91);
% Specify Y = F(X)
Y = 3*X + 2 + noise;
% Cook's Distance for a given data point measures the extent to
% which a regression model would change if this data point
% were excluded from the regression. Cook's Distance is
% sometimes used to suggest whether a given data point might be an outlier.
% Use regstats to calculate Cook's Distance
stats = regstats(Y,X,'linear');
% if Cook's Distance > n/4 is a typical treshold that is used to suggest
% the presence of an outlier
potential_outlier = stats.cookd > 4/length(X);
% Display the index of potential outliers and graph the results
X(potential_outlier)
scatter(X,Y, 'b.')
hold on
scatter(X(potential_outlier),Y(potential_outlier), 'r.')
MATLAB remove outliers.的更多相关文章
- matlab中的containers.Map()
matlab中的containers.Map() 标签: matlabcontainers.Map容器map 2015-10-27 12:45 1517人阅读 评论(1) 收藏 举报 分类: Mat ...
- Taxi Trip Time Winners' Interview: 3rd place, BlueTaxi
Taxi Trip Time Winners' Interview: 3rd place, BlueTaxi This spring, Kaggle hosted two competitions w ...
- 异常值处理outlier
python信用评分卡(附代码,博主录制) https://study.163.com/course/introduction.htm?courseId=1005214003&utm_camp ...
- 壁虎书1 The Machine Learning Landscape
属性与特征: attribute: e.g., 'Mileage' feature: an attribute plus its value, e.g., 'Mileage = 15000' Note ...
- 第三课 创建函数 - 从EXCEL读取 - 导出到EXCEL - 异常值 - Lambda函数 - 切片和骰子数据
第 3 课 获取数据 - 我们的数据集将包含一个Excel文件,其中包含每天的客户数量.我们将学习如何对 excel 文件进行处理.准备数据 - 数据是有重复日期的不规则时间序列.我们将挑战数 ...
- Learning Spark中文版--第六章--Spark高级编程(2)
Working on a Per-Partition Basis(基于分区的操作) 以每个分区为基础处理数据使我们可以避免为每个数据项重做配置工作.如打开数据库连接或者创建随机数生成器这样的操作,我们 ...
- Matlab的标记分水岭分割算法
1 综述 Separating touching objects in an image is one of the more difficult image processing operation ...
- Matlab编程基础
平台:Win7 64 bit,Matlab R2014a(8.3) “Matlab”是“Matrix Laboratory” 的缩写,中文“矩阵实验室”,是强大的数学工具.本文侧重于Matlab的编程 ...
- Matlab 进阶学习记录
最近在看 Faster RCNN的Matlab code,发现很多matlab技巧,在此记录: 1. conf_proposal = proposal_config('image_means', ...
随机推荐
- HNOI2010弹飞绵羊
不得不说块状数组好神奇的啊!这道题的标签可是splay的启发是合并(什么高大上的东西),竟然这么轻松的就解决了! var x,y,i,j,tot,n,m,ch:longint; f,k,l,bl,go ...
- TCP/IP详解学习笔记(12)-TCP的超时与重传
超时重传是TCP协议保证数据可靠性的另一个重要机制,其原理是在发送某一个数据以后就开启一个计时器,在一定时间内如果没有得到发送的数据报的ACK报文,那么就重新发送数据,直到发送成功为止. 1.超时 超 ...
- SQL之50个常用的SQL语句
50个常用的sql语句 Student(S#,Sname,Sage,Ssex) 学生表 Course(C#,Cname,T#) 课程表 SC(S#,C#,score) 成绩表 Teacher(T#,T ...
- hdu 3336 count the string(KMP+dp)
题意: 求给定字符串,包含的其前缀的数量. 分析: 就是求所有前缀在字符串出现的次数的和,可以用KMP的性质,以j结尾的串包含的串的数量,就是next[j]结尾串包含前缀的数量再加上自身是前缀,dp[ ...
- 两段简单的JS代码防止SQL注入
1.URL地址防注入: //过滤URL非法SQL字符var sUrl=location.search.toLowerCase();var sQuery=sUrl.substring(sUrl.inde ...
- SqlServer获取两个日期时间差
SELECT datediff(yy,'2010-06-1 10:10',GETDATE()) --计算多少年 SELECT datediff(q,'2011-01-1 10:10',GETDATE( ...
- Eclipse中设置在创建新类时自动生成注释的方法
windows–>preference Java–>Code Style–>Code Templates code–>new Java files 编辑它 ${filecom ...
- 从零开始完整Electron桌面开发(1)搭建开发环境
[OTC] # 需要知识 1. 简单的html.javascript.css知识,就是web前端入门知识. 2. 简单命令行的应用,不会也没关系,照着代码敲就行. 3. 下载安装就不说了吧. 4. 本 ...
- 机器学习真的可以起作用吗?(3)(以二维PLA为例)
前两篇文章已经完成了大部分的工作,这篇文章主要是讲VC bound和 VC dimension这两个概念. (一)前文的一点补充 根据前面的讨论,我们似乎只需要用来替代来源的M就可以了,但是实际公式却 ...
- KMP(字符串匹配)
1.KMP是一种用来进行字符串匹配的算法,首先我们来看一下普通的匹配算法: 现在我们要在字符串ababcabcacbab中找abcac是不是存在,那么传统的查找方法就是一个个的匹配了,如图: 经过六趟 ...