I find myself coming back to the same few pictures when explaining basic machine learning concepts. Below is a list I find most illuminating.

1. Test and training error: Why lower training error is not always a good thing: ESL Figure 2.11. Test and training error as a function of model complexity.

2. Under and overfitting: PRML Figure 1.4. Plots of polynomials having various orders M, shown as red curves, fitted to the data set generated by the green curve.

3. Occam's razor: ITILA Figure 28.3. Why Bayesian inference embodies Occam’s razor. This figure gives the basic intuition for why complex models can turn out to be less probable. The horizontal axis represents the space of possible data sets D. Bayes’ theorem rewards models in proportion to how much they predicted the data that occurred. These predictions are quantified by a normalized probability distribution on D. This probability of the data given model Hi, P (D | Hi), is called the evidence for Hi. A simple model H1 makes only a limited range of predictions, shown by P(D|H1); a more powerful model H2, that has, for example, more free parameters than H1, is able to predict a greater variety of data sets. This means, however, that H2 does not predict the data sets in region C1 as strongly as H1. Suppose that equal prior probabilities have been assigned to the two models. Then, if the data set falls in region C1, the less powerful model H1 will be the more probable model.

4. Feature combinations: (1) Why collectively relevant features may look individually irrelevant, and also (2) Why linear methods may fail. From Isabelle Guyon's feature extraction slides.

5. Irrelevant features: Why irrelevant features hurt kNN, clustering, and other similarity based methods. The figure on the left shows two classes well separated on the vertical axis. The figure on the right adds an irrelevant horizontal axis which destroys the grouping and makes many points nearest neighbors of the opposite class.

6. Basis functions: How non-linear basis functions turn a low dimensional classification problem without a linear boundary into a high dimensional problem with a linear boundary. From SVM tutorial slides by Andrew Moore: a one dimensional non-linear classification problem with input x is turned into a 2-D problem z=(x, x^2) that is linearly separable.

7. Discriminative vs. Generative: Why discriminative learning may be easier than generative: PRML Figure 1.27. Example of the class-conditional densities for two classes having a single input variable x (left plot) together with the corresponding posterior probabilities (right plot). Note that the left-hand mode of the class-conditional density p(x|C1), shown in blue on the left plot, has no effect on the posterior probabilities. The vertical green line in the right plot shows the decision boundary in x that gives the minimum misclassification rate.

8. Loss functions: Learning algorithms can be viewed as optimizing different loss functions: PRML Figure 7.5. Plot of the ‘hinge’ error function used in support vector machines, shown in blue, along with the error function for logistic regression, rescaled by a factor of 1/ln(2) so that it passes through the point (0, 1), shown in red. Also shown are the misclassification error in black and the squared error in green.

9. Geometry of least squares: ESL Figure 3.2. The N-dimensional geometry of least squares regression with two predictors. The outcome vector y is orthogonally projected onto the hyperplane spanned by the input vectors x1 and x2. The projection yˆ represents the vector of the least squares predictions.

10. Sparsity: Why Lasso (L1 regularization or Laplacian prior) gives sparse solutions (i.e. weight vectors with more zeros): ESL Figure 3.11. Estimation picture for the lasso (left) and ridge regression (right). Shown are contours of the error and constraint functions. The solid blue areas are the constraint regions |β1| + |β2| ≤ t and β12 + β22 ≤ t2, respectively, while the red ellipses are the contours of the least squares error function.

from: http://www.denizyuret.com/2014/02/machine-learning-in-5-pictures.html

用10张图来看机器学习Machine learning in 10 pictures的更多相关文章

  1. 【机器学习Machine Learning】资料大全

    昨天总结了深度学习的资料,今天把机器学习的资料也总结一下(友情提示:有些网站需要"科学上网"^_^) 推荐几本好书: 1.Pattern Recognition and Machi ...

  2. 机器学习(Machine Learning)&深度学习(Deep Learning)资料

    <Brief History of Machine Learning> 介绍:这是一篇介绍机器学习历史的文章,介绍很全面,从感知机.神经网络.决策树.SVM.Adaboost到随机森林.D ...

  3. 机器学习(Machine Learning)&深入学习(Deep Learning)资料

    <Brief History of Machine Learning> 介绍:这是一篇介绍机器学习历史的文章,介绍很全面,从感知机.神经网络.决策树.SVM.Adaboost 到随机森林. ...

  4. 机器学习(Machine Learning)&深度学习(Deep Learning)资料【转】

    转自:机器学习(Machine Learning)&深度学习(Deep Learning)资料 <Brief History of Machine Learning> 介绍:这是一 ...

  5. 机器学习(Machine Learning)&深度学习(Deep Learning)资料汇总 (上)

    转载:http://dataunion.org/8463.html?utm_source=tuicool&utm_medium=referral <Brief History of Ma ...

  6. 机器学习(Machine Learning)&amp;深度学习(Deep Learning)资料

    机器学习(Machine Learning)&深度学习(Deep Learning)资料 機器學習.深度學習方面不錯的資料,轉載. 原作:https://github.com/ty4z2008 ...

  7. 机器学习(Machine Learning)与深度学习(Deep Learning)资料汇总

    <Brief History of Machine Learning> 介绍:这是一篇介绍机器学习历史的文章,介绍很全面,从感知机.神经网络.决策树.SVM.Adaboost到随机森林.D ...

  8. 机器学习(Machine Learning)&深度学习(Deep Learning)资料(Chapter 2)

    ##机器学习(Machine Learning)&深度学习(Deep Learning)资料(Chapter 2)---#####注:机器学习资料[篇目一](https://github.co ...

  9. 数据挖掘(data mining),机器学习(machine learning),和人工智能(AI)的区别是什么? 数据科学(data science)和商业分析(business analytics)之间有什么关系?

    本来我以为不需要解释这个问题的,到底数据挖掘(data mining),机器学习(machine learning),和人工智能(AI)有什么区别,但是前几天因为有个学弟问我,我想了想发现我竟然也回答 ...

随机推荐

  1. Java 中的静态嵌套类和非静态嵌套类

    Java 中的静态嵌套类和非静态嵌套类 术语:嵌套类分为两类:静态嵌套类和非静态嵌套类.声明 static 的嵌套类称为静态嵌套类,非静态嵌套类也称为内部类. class OuterClass { p ...

  2. 为什么23种设计模式里面没有MVC?

    作者:lorio链接:https://www.zhihu.com/question/27738109/answer/100241918来源:知乎著作权归作者所有.商业转载请联系作者获得授权,非商业转载 ...

  3. hdu 1874 畅通工程续(迪杰斯特拉优先队列,floyd,spfa)

    畅通工程续 Time Limit: 3000/1000 MS (Java/Others)    Memory Limit: 32768/32768 K (Java/Others) Total Subm ...

  4. java8新特性——时间日期API

    传统的时间 API 存在线程安全的问题,在多线程开发中必须要上锁,所以 java8 现在为我们提供了一套全新的时间日期 API ,今天进来学习一下java8 的时间日期 API. 一.使用 Local ...

  5. 【HDU 6017】 Girls Love 233 (DP)

    Girls Love 233 Time Limit: 2000/1000 MS (Java/Others)    Memory Limit: 65536/65536 K (Java/Others)To ...

  6. BZOJ3881 Coci2015 Divljak fail树+差分

    题目大意,给出两个字符串集合S和T,向T中添加字符串,查询S_i在T中有几个字符串出现过.一看这种多字符串匹配问题,我们联想到了AC自动机,做法就是,对于S集合我们建立一个AC自动机,建出fail树, ...

  7. [CodeForces-441E]Valera and Number

    题目大意: 给你一个数x,进行k次操作: 1.有p%的概率将x翻倍: 2.有1-p%的概率将x加1. 问最后二进制下x末尾0个数的期望. 思路: 动态规划. 由于k只到200,所以每次修改只与最后8位 ...

  8. 【树链剖分/倍增模板】【洛谷】3398:仓鼠找sugar

    P3398 仓鼠找sugar 题目描述 小仓鼠的和他的基(mei)友(zi)sugar住在地下洞穴中,每个节点的编号为1~n.地下洞穴是一个树形结构.这一天小仓鼠打算从从他的卧室(a)到餐厅(b),而 ...

  9. 【8.22校内测试】【数学】【并查集】【string】

    今天的t2t3能打出来80分的暴力都好满足啊QwQ.(%%%$idy$ 今天的签到题,做的时候一眼就看出性质叻qwq.大于11的所有数分解合数都可以用4.6.9表示,乱搞搞就可以了. #include ...

  10. bzoj1036 count 树链剖分或LCT

    这道题很久以前用树链剖分写的,最近在学LCT ,就用LCT再写了一遍,也有一些收获. 因为这道题点权可以是负数,所以在update时就要注意一下,因为平时我的0节点表示空,它的点权为0,这样可以处理点 ...