Machine Learning - XV. Anomaly Detection异常检測 (Week 9)
http://blog.csdn.net/pipisorry/article/details/44783647
机器学习Machine Learning - Andrew NG courses学习笔记
Anomaly Detection异常检測
Problem Motivation问题的动机
Anomaly detection example
Applycation of anomaly detection
Note:for
Frauddetection: users behavior examples of features of a users activity may be on the website it'd be things like,maybe x1 is how often does this user log in, x2,the number of what pages visited, or the number of transactions, maybe
x3 is the number of posts of the users on the forum, feature x4 could be the typing speed of the user.And so you can model p of x based on this sort of data.
Gaussian Distribution高斯分布(正态分布)
高斯分布
Note:
1. the width of this bell-shaped curve,sigma, is also called one standard deviation.sigma代表的是钟形的宽度。
2. p of x semicolon Mu comma sigma squared denotes that the probability of x is parametrized by the two parameters Mu and sigma squared.
3. p of x plotted as a function of x,for a fixed value of Mu and of sigma squared sigma squared, that's called the variance.
Parameter estimation參数预计
watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvcGlwaXNvcnJ5/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center" alt="" height="300" width="600">
Note:
1. suspect that each of these examples was distributed according to a normal or Gaussian distribution with some parameter Mu and some parameter sigma squared.
2. estimate Mu is going to be just the average of my example.So Mu is the mean parameter,
3. these estimates are actually the maximum likelihood estimates of the parameters of Mu and sigma squared.
4. this first term becomes 1 over m minus 1, instead of 1 over m. In machine learning, people tend to use this 1 over m formula.But in practice, whether it is 1 over m or 1 over m minus one, makes essentially no difference, assuming m is reasonably
large, it's a large training set size.
Algorithm异常检測算法
density estimation密度预计
Note:
1. model p of x from the data sets.we are going to try to figure out what are high probability features, what are lower probability types of features.
2. this equation actually corresponds to an independence assumption on the values of the features x1 through xn.But in practice it turns out that the algorithm of this fragment, it works just fine,whether or not these features are anywhere
close to independent and even if independence assumption doesn't hold true.
3. the problem of estimating this distribution p of x, they're sometimes called the problem of density estimation.
4. 不同的features有不同的mu和mean.
异常检測算法
watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvcGlwaXNvcnJ5/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center" alt="" height="313" width="620">
Note:
1. to choose features that describe general properties of the things that you're collecting data on.
2. mu J just take the mean over my training set of the values of the j feature.
3. 异常的就是说产生新example的features总的概率相当低。发生了就是异常的。
Anomaly detection example
watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvcGlwaXNvcnJ5/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center" alt="" height="300" width="600">
{为什么这里不用supervised learning, e.g. svm,而是用的anomaly detection: 在后两节会讲到}
Developing and Evaluating an Anomaly Detection System开发和评价异常检測系统
Note:
1. Training set is unlabled, cross validation & test set is labled.
2. 对于异常检測问题,要检測出的是anomalous的,所以anomalous相应y = 1
watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvcGlwaXNvcnJ5/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center" alt="" height="300" width="600">
Note:
1. we call training set an unlabeled training set but all of these examples are really ones that correspond to y equals 0.so that's our training set of all,good, or the vast majority of good examples.
2. So, we will use these 6000 engines to fit p of x. And so we would these 6,000 examples to estimate the parameters Mu 1, sigma squared 1, up to Mu N, sigma squared N.
3. someone put the same 4000 in the cross validation set and the test set.but we like to think of the cross validation set and the test set as being completely different data sets to each other,it is not considered a good machine learning practice.
Algorithm evaluation算法评估
Note:
1. these labels are will be very skewed because y equals zero, that is normal examples, usually be much more common than y equals 1 than anomalous examples.所以要用precion/recall评估,而不能使用classification accuracy。
2. to set epsilon, evaluate the algorithm on the cross validation set, and then when we've picked the set of features, when we've found the value of epsilon, do the final evaluation of the algorithm on the test sets.
Anomaly Detection vs. Supervised Learning异常检測vs.监督学习
{if we have this labeled data,why don't we just use a supervised learning algorithm logistic regression or a neural network,to try to learn directly from our labeled data, to predict whether y equals one or y equals zero}
分别在哪种情况下使用the properties of a learning problem that cause to treat it as an anomaly detention verses a supervised learning
Note:
1. Anomaly Detection:when we are doing the process of estimating p of x, of fitting all those Gaussian parameters,we need only negative examples to do that.So if you have a lot of negative data,we can still fit to p of x pretty well.
2. Anomaly Detection:for anomaly detection applications often there are many different types of anomalies that could go wrong that could break an aircraft engine.it can be difficult for an algorithm to learn from your small set of positive
examples what the anomalies look like.And in particular,future anomalies may look nothing like the ones you've seen so far.new way for an aircraft engine to be broken that you have just never seen before,then it might be more promising to just model the negative
examples, with a sort of a Gaussian model P of X. Rather than try too hard to model the positive examples.
3. for the SPAM problem, we usually have enough examples of spam email to see,most of these different types of SPAM email, because we have a large set of examples of SPAM, and that's why we usually think of SPAM as asupervised learningsetting,
even though, there may be many different types of SPAM.
some applications of anomaly detection versus supervised learning应用上的差别
Note: if you are very a major online retailer, and have had a lot of people try to commit fraud on your website,sometimes fraud detection could actually shift over to the supervised learning column.for some
manufacturing processes, if you're manufacturing very large volumes and you've seen a lot of bad examples, maybe manufacturing could shift to the supervised learning column as well.
Choosing What Features to Use选择使用哪些features
Non-gaussian features转换成gaussian features
watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvcGlwaXNvcnJ5/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center" alt="" height="300" width="408">
watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvcGlwaXNvcnJ5/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center" alt="" height="300" width="400">
watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvcGlwaXNvcnJ5/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center" alt="" height="300" width="400">
watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvcGlwaXNvcnJ5/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center" alt="" height="300" width="400">
Note:
1. in case your data looks non-Gaussian, the algorithms will often work just find.
2. play with different transformations of the data in order to make it look more Gaussian.
3. more generally with log x with x2 and some constant c and this constant could be something to try to make it look as Gaussian as possible.
4. new feature x_new (0.05) looks more Gaussian than my previous one and then I might instead use this new feature to feed into my anomaly detection algorithm.
5. You could also have hist of log of x, that's another example of a transformation you can use.that also look pretty Gaussian.So, I can also define x_new equals log of x.
come up with features for an anomaly detection algorithm
watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvcGlwaXNvcnJ5/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center" alt="" height="300" width="600">
Note:
1. Look at the anomaly that the algorithm is failing to flag, and see if that inspires you tocreate some new feature.so that with this new feature it becomes easier to distinguish the anomalies from your good examples.
2. 绿色x代表anomaly example, 仅仅有一个feature时会区分错误。加一个feature x2时就能够正确区分。
Note:
1. I have a very high CPU load, and have a very high network traffic.suspect the failure cases is one of my computers has a job that gets stuck in some infinite loop.and so the CPU load grows,but the network traffic doesn't because it's
just spinning it's wheels and doing a lot of CPU work,stuck in some infinite loop.create a new feature, X5,which might be CPU load divided by network traffic.
2. And by creating features like these, you can start to capture anomalies that correspond to unusual combinations of values of the features.
Multivariate Gaussian Distribution (Optional)多变量高斯分布
{sometimes catch some anomalies that the earlier algorithm didn't}
Note:
1. most of the data data lies in this region(相应蓝色区域内), and so thatgreen cross is pretty far away from any of the data I've seen.It looks like that should be raised as an anomaly.
2. 但对于green cross。p(x1)和p(x2)分别都相对正常,就不会将其判定为anomaly.相应在洋红色区域内。
Multivariate Gaussian (Normal)distribution多变量高斯分布
Multivariate Gaussian (Normal)examples
watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvcGlwaXNvcnJ5/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center" alt="" height="300" width="600">——
watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvcGlwaXNvcnJ5/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center" alt="" height="300" width="600">——
Note:
1. Sigma is a covariance matrix and measures the variance or the variability of the features X1 X2.Sigma 对角线是方差。两边就是协方差。协方差means两个features的线性相关度。
2. X1 and X2 tend to be highly correlated with each other for example.to change the off diagonal entries of this covariance matrix.就会出现斜着的椭圆。
so increase the off-diagonal entries from .5 to .8, it is more andmore thinly peaked along
this sort of x equals y line.
Anomaly Detection using the Multivariate Gaussian Distribution (Optional)用多变量的高斯分布进行异常检測
watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvcGlwaXNvcnJ5/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center" alt="" height="300" width="600">
Note:And you set sigmato be equal to this.And this is actually just like the sigma, when we were using the PCA.
Relationship to original model单变量和多变量的联系和差别
Note:
1. the original model actually corresponds to a special case of a multivariate Gaussian distribution.this special case is defined by constraining the distribution of p of x, the multivariate a Gaussian distribution of p of x,so that the
contours of the probability density function are axis aligned(轴对齐).
2. the multivariate Gaussian distribution,corresponds exactly to the old model, if the covariance matrix sigma, has only 0 elements off the diagonals.
Note:
1. the multivariate Gaussian model has a lot of parameters, so this covariance matrix sigma is an n by n matrix,has roughly n squared parameters, because it's a symmetric matrix,it's actually closer to n squared over 2 parameters, but this is a lot of parameters,
so you need make sure you have a fairly large value for m, make sure you have enough data to fit all these parameters.
2. m greater than or equal to 10 n would be a reasonable rule of thumb to make sure that you can estimate this covariance matrix sigma reasonably well.
3. in problems where you have a very large training set or m is very large and n is not too large, then themultivariate Gaussian model is well worth considering and may work better as well, and can save you from having to spend your time
to manually create extra features in case the anomalies turn out to be captured by unusual combinations of values of the features.
4. covariance matrix sigma non-invertible, they're usually 2 cases for this.Oneis if it's failing to satisfy this m greater than n condition;secondcase is if you have redundant features.if you have 2 features
that are the same.if your x1 is just equal to x2. Or if you have redundant features like maybe your features X3 is equal to feature X4, plus feature X5,well X3 doesn't contain any extra information.
Reviews复习
watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvcGlwaXNvcnJ5/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center" alt="" height="155" width="900">
watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvcGlwaXNvcnJ5/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center" alt="" height="455" width="900">
from:http://blog.csdn.net/pipisorry/article/details/44783647
ref:《Anomaly Detection with Apache Spark》Spark上的异常检測
异常检測用于社区关键言论发现[Beyond Trending Topics: identifying important conversations in communities]
异常检測在Netflix的应用[Netflix使用的异常server侦測技术]
拓扑异常监測《Topological Anomaly Detection》
开源(R):基于马氏距离/Cerioli方法的多元异常监測CerioliOutlierDetection
Machine Learning - XV. Anomaly Detection异常检測 (Week 9)的更多相关文章
- machine learning 之 Anomaly detection
自Andrew Ng的machine learning课程. 目录: Problem Motivation Gaussian Distribution Algorithm Developing and ...
- 【原】Coursera—Andrew Ng机器学习—课程笔记 Lecture 15—Anomaly Detection异常检测
Lecture 15 Anomaly Detection 异常检测 15.1 异常检测问题的动机 Problem Motivation 异常检测(Anomaly detection)问题是机器学习算法 ...
- 异常检测(Anomaly detection): 异常检测算法(应用高斯分布)
估计P(x)的分布--密度估计 我们有m个样本,每个样本有n个特征值,每个特征都分别服从不同的高斯分布,上图中的公式是在假设每个特征都独立的情况下,实际无论每个特征是否独立,这个公式的效果都不错.连乘 ...
- Machine Learning and Data Mining(机器学习与数据挖掘)
Problems[show] Classification Clustering Regression Anomaly detection Association rules Reinforcemen ...
- (原创)Stanford Machine Learning (by Andrew NG) --- (week 9) Anomaly Detection&Recommender Systems
这部分内容来源于Andrew NG老师讲解的 machine learning课程,包括异常检测算法以及推荐系统设计.异常检测是一个非监督学习算法,用于发现系统中的异常数据.推荐系统在生活中也是随处可 ...
- 异常检测(Anomaly Detection)
十五.异常检测(Anomaly Detection) 15.1 问题的动机 参考文档: 15 - 1 - Problem Motivation (8 min).mkv 在接下来的一系列视频中,我将向大 ...
- 斯坦福NG机器学习课程:Anomaly Detection笔记
Anomaly Detection Problem motivation: 首先描写叙述异常检測的样例:飞机发动机异常检測 watermark/2/text/aHR0cDovL2Jsb2cuY3Nkb ...
- Time Series Anomaly Detection
这里有个2015年的综述文章,概括的比较好,各种技术的适用场景. https://iwringer.wordpress.com/2015/11/17/anomaly-detection-concep ...
- 基于QT和OpenCV的人脸检測识别系统(2)
紧接着上一篇博客的讲 第二步是识别部分 人脸识别 把上一阶段检測处理得到的人脸图像与数据库中的已知 人脸进行比对,判定人脸相应的人是谁(此处以白色文本显示). 人脸预处理 如今你已经得到一张人脸,你能 ...
随机推荐
- Oracle-3 - :超级适合初学者的入门级笔记--用户权限,set运算符,高级子查询
上一篇的内容在这里第二篇内容, 用户权限:创建用户,创建角色,使用grant 和 revoke 语句赋予和回收权限,创建数据库联接 创建用户:create user xxx identified b ...
- Luogu P2183 巧克力
题目描述 佳佳邀请了M个同学到家里玩.为了招待客人,她需要将巧克力分给她的好朋友们.她有N(1<=N<=5000)块巧克力,但是大小各不一样,第i块巧克力大小为为1*Xi(1<=i& ...
- 如何优雅的设计 React 组件
作者:晓冬 本文原创,转载请注明作者及出处 如今的 Web 前端已被 React.Vue 和 Angular 三分天下,一统江山十几年的 jQuery 显然已经很难满足现在的开发模式.那么,为什么大家 ...
- 启用 Brotli 压缩算法,对比 Gzip 压缩 CDN 流量再减少 20%
Google 认为互联网用户的时间是宝贵的,他们的时间不应该消耗在漫长的网页加载中,因此在 2015 年 9 月 Google 推出了无损压缩算法 Brotli.Brotli 通过变种的 LZ77 算 ...
- 07深入理解Java线程池
之前面试baba系时遇到一个相对简单的多线程编程题,即"3个线程循环输出ADC",自己答的并不是很好,深感内疚,决定更加仔细的学习<并发编程的艺术>一书,到达掌握的强度 ...
- oracle数据库创建表,序列及添加代码案例
create table cdpt( id number(6), name varchar2(30), constraint pk_id primary key(id) ); 更改数据库的“延迟段创建 ...
- phpstorm-----实现实时编辑服务器代码
phpstorm是一款功能强大.广大码农钟爱的编辑器,也是我最常用的编辑器.这里介绍一个偷懒的功能. 用sftp协议与远程服务器相连接,实现直接编辑服务器代码的功能.而效果就是ctrl+s不仅可以保存 ...
- R-CNN论文翻译
R-CNN论文翻译 Rich feature hierarchies for accurate object detection and semantic segmentation 用于精确物体定位和 ...
- OC的特有语法-分类Category、 类的本质、description方法、SEL、NSLog输出增强、点语法、变量作用域、@property @synthesize关键字、Id、OC语言构造方法
一. 分类-Category 1. 基本用途:Category 分类是OC特有的语言,依赖于类. ➢ 如何在不改变原来类模型的前提下,给类扩充一些方法?有2种方式 ● 继承 ● 分类(Categor ...
- em单位使用小结
em是一个css的单位. em是一个相对的单位.相对于当前对象内文本的字体尺寸.如当前对行内文本的字体尺寸未被人为设置,则相对于浏览器的默认字体尺寸.(引自CSS2.0手册) 一般在DOM元素中,当前 ...