用10张图来看机器学习Machine learning in 10 pictures
I find myself coming back to the same few pictures when explaining basic machine learning concepts. Below is a list I find most illuminating.

1. Test and training error: Why lower training error is not always a good thing: ESL Figure 2.11. Test and training error as a function of model complexity.

2. Under and overfitting: PRML Figure 1.4. Plots of polynomials having various orders M, shown as red curves, fitted to the data set generated by the green curve.

3. Occam's razor: ITILA Figure 28.3. Why Bayesian inference embodies Occam’s razor. This figure gives the basic intuition for why complex models can turn out to be less probable. The horizontal axis represents the space of possible data sets D. Bayes’ theorem rewards models in proportion to how much they predicted the data that occurred. These predictions are quantified by a normalized probability distribution on D. This probability of the data given model Hi, P (D | Hi), is called the evidence for Hi. A simple model H1 makes only a limited range of predictions, shown by P(D|H1); a more powerful model H2, that has, for example, more free parameters than H1, is able to predict a greater variety of data sets. This means, however, that H2 does not predict the data sets in region C1 as strongly as H1. Suppose that equal prior probabilities have been assigned to the two models. Then, if the data set falls in region C1, the less powerful model H1 will be the more probable model.

4. Feature combinations: (1) Why collectively relevant features may look individually irrelevant, and also (2) Why linear methods may fail. From Isabelle Guyon's feature extraction slides.
.png)
5. Irrelevant features: Why irrelevant features hurt kNN, clustering, and other similarity based methods. The figure on the left shows two classes well separated on the vertical axis. The figure on the right adds an irrelevant horizontal axis which destroys the grouping and makes many points nearest neighbors of the opposite class.

6. Basis functions: How non-linear basis functions turn a low dimensional classification problem without a linear boundary into a high dimensional problem with a linear boundary. From SVM tutorial slides by Andrew Moore: a one dimensional non-linear classification problem with input x is turned into a 2-D problem z=(x, x^2) that is linearly separable.

7. Discriminative vs. Generative: Why discriminative learning may be easier than generative: PRML Figure 1.27. Example of the class-conditional densities for two classes having a single input variable x (left plot) together with the corresponding posterior probabilities (right plot). Note that the left-hand mode of the class-conditional density p(x|C1), shown in blue on the left plot, has no effect on the posterior probabilities. The vertical green line in the right plot shows the decision boundary in x that gives the minimum misclassification rate.

8. Loss functions: Learning algorithms can be viewed as optimizing different loss functions: PRML Figure 7.5. Plot of the ‘hinge’ error function used in support vector machines, shown in blue, along with the error function for logistic regression, rescaled by a factor of 1/ln(2) so that it passes through the point (0, 1), shown in red. Also shown are the misclassification error in black and the squared error in green.

9. Geometry of least squares: ESL Figure 3.2. The N-dimensional geometry of least squares regression with two predictors. The outcome vector y is orthogonally projected onto the hyperplane spanned by the input vectors x1 and x2. The projection yˆ represents the vector of the least squares predictions.

10. Sparsity: Why Lasso (L1 regularization or Laplacian prior) gives sparse solutions (i.e. weight vectors with more zeros): ESL Figure 3.11. Estimation picture for the lasso (left) and ridge regression (right). Shown are contours of the error and constraint functions. The solid blue areas are the constraint regions |β1| + |β2| ≤ t and β12 + β22 ≤ t2, respectively, while the red ellipses are the contours of the least squares error function.
from: http://www.denizyuret.com/2014/02/machine-learning-in-5-pictures.html
用10张图来看机器学习Machine learning in 10 pictures的更多相关文章
- 【机器学习Machine Learning】资料大全
昨天总结了深度学习的资料,今天把机器学习的资料也总结一下(友情提示:有些网站需要"科学上网"^_^) 推荐几本好书: 1.Pattern Recognition and Machi ...
- 机器学习(Machine Learning)&深度学习(Deep Learning)资料
<Brief History of Machine Learning> 介绍:这是一篇介绍机器学习历史的文章,介绍很全面,从感知机.神经网络.决策树.SVM.Adaboost到随机森林.D ...
- 机器学习(Machine Learning)&深入学习(Deep Learning)资料
<Brief History of Machine Learning> 介绍:这是一篇介绍机器学习历史的文章,介绍很全面,从感知机.神经网络.决策树.SVM.Adaboost 到随机森林. ...
- 机器学习(Machine Learning)&深度学习(Deep Learning)资料【转】
转自:机器学习(Machine Learning)&深度学习(Deep Learning)资料 <Brief History of Machine Learning> 介绍:这是一 ...
- 机器学习(Machine Learning)&深度学习(Deep Learning)资料汇总 (上)
转载:http://dataunion.org/8463.html?utm_source=tuicool&utm_medium=referral <Brief History of Ma ...
- 机器学习(Machine Learning)&深度学习(Deep Learning)资料
机器学习(Machine Learning)&深度学习(Deep Learning)资料 機器學習.深度學習方面不錯的資料,轉載. 原作:https://github.com/ty4z2008 ...
- 机器学习(Machine Learning)与深度学习(Deep Learning)资料汇总
<Brief History of Machine Learning> 介绍:这是一篇介绍机器学习历史的文章,介绍很全面,从感知机.神经网络.决策树.SVM.Adaboost到随机森林.D ...
- 机器学习(Machine Learning)&深度学习(Deep Learning)资料(Chapter 2)
##机器学习(Machine Learning)&深度学习(Deep Learning)资料(Chapter 2)---#####注:机器学习资料[篇目一](https://github.co ...
- 数据挖掘(data mining),机器学习(machine learning),和人工智能(AI)的区别是什么? 数据科学(data science)和商业分析(business analytics)之间有什么关系?
本来我以为不需要解释这个问题的,到底数据挖掘(data mining),机器学习(machine learning),和人工智能(AI)有什么区别,但是前几天因为有个学弟问我,我想了想发现我竟然也回答 ...
随机推荐
- java总结(方法与对象)
包 1.用于管理类 2.包名采用公司的域名倒序+项目名+模块名 3.包的引入 类 1.main方法用static 它调用的方法也要static 2.程序要运行必须要main方法 3.类是一种引用数据类 ...
- angular4 动态Form中获取表单字段并在页面中使用的方法
主要有两种方式 第一种 使用get属性 页面中使用如下: 第二种 使用普通方法事件 页面中使用如下 *转载请附出处
- PHP获取客户端请求头信息
获取HTTP请求头信息 Apache 如果web服务器用的是apache,可以直接用php的库函数getallheaders() Nginx 如果web服务器用的是nginx,则无法直接使用getal ...
- Sublime 编译运行JavaScript
Tools > Build System > New Build System... { "cmd": ["node", "$file&q ...
- 深度学习基础系列(一)| 一文看懂用kersa构建模型的各层含义(掌握输出尺寸和可训练参数数量的计算方法)
我们在学习成熟网络模型时,如VGG.Inception.Resnet等,往往面临的第一个问题便是这些模型的各层参数是如何设置的呢?另外,我们如果要设计自己的网路模型时,又该如何设置各层参数呢?如果模型 ...
- 【洛谷比赛】[LnOI2019]长脖子鹿省选模拟赛 T1 题解
今天是[LnOI2019]长脖子鹿省选模拟赛的时间,小编表示考的不怎么样,改了半天也只会改第一题,那也先呈上题解吧. T1:P5248 [LnOI2019SP]快速多项式变换(FPT) 一看这题就很手 ...
- AFF镜像工具集afflib-tools
AFF镜像工具集afflib-tools Advanced Forensic Format(AFF)是一种开源免费的磁盘镜像格式.作为磁盘数字取证的三大格式之一,AFF提供数字取证的各项功能,如签 ...
- BZOJ1226 SDOI2009学校食堂
这题状压DP太神了. g[i][j][k]表示前i-1个人都已打到饭,自己和后七个人打饭的情况是j,当前最后一个打饭的与i的关系是k 如果j&1==1说明当前这个人也打了饭,那么可以转移到g[ ...
- luogu 11月月赛 斐波那契数列
本来想作为水题刷,很快就想出了做法,结果细节实现太差改了好久... 根据题意你会发现其实就是求方程 ax+by=k解的个数. 此时 a=f[i],b=f[i+1],而(x,y)就是你要求的数对. 于是 ...
- Redis 锁的实现方案
开发中不可避免的是碰到并发请求,在数据严谨性的要求不高时,我们也不需要做什么处理,但如果碰到数据严谨性非常高的时候(例如:用户金额,秒杀产品的库存...),我们就需要慎重处理了. 解决方案多种多样,下 ...