Everything You Wanted to Know About Machine Learning
所以张小龙才说‘我说的都是错的’。
note by 王犇
- A set of possible models to look thorough
- A way to test whether a model is good
- A clever way to find a really good model with only a few test
is to always measure the performance of your classifier on out-of-sample data.
testing splits. You should even make some predictions on data you imagine yourself, to see what the model does in certain situations.
you add more and more input fields, you must also add more and more training data to “fill up” the space created
by the additional inputs if you want to use them accurately.
否则非常可能噪音会让你的模型效果更差。
They Seem
know if an algorithm will model your data well is to try it out.
single input field, or even any single pair of fields, is closely correlated with the objective.
to create fields that make machine learning algorithms work better.
of the project’s time goes into feature engineering, 20% goes towards figuring out what comprises a proper and
comprehensive evaluation of the algorithm, and only 10% goes into algorithm selection and tuning.
两个经纬度和两者间的距离是须要相当复杂的转换工作。
转换后可以和用户是否愿意在同一天在两个城市间开车具有很强的关联性。
good evidence that, in a lot of problems, very simple machine learning techniques can be levered into incredibly
powerful classifiers with the addition of loads of data.
A big reason for this is because, once you’ve defined your input fields, there’s only so much analytic gymnastics you can do. Computer algorithms trying to learn models have only a relatively few tricks they can do efficiently, and many of them are not so very
different. Thus, as we have
said before, performance differences between algorithms are typically not large. Thus, if you want better classifiers, you should spend your time:
- Engineering better features
- Getting your hands on more high-quality data
而且事实上非常多模型的原理也都有相似之处。(想想n多的Learning
2 Rank算法)所以假设你希望达到更好的分类器。你能够优先这么做:
a more powerful model by learning multiple classifiers over different random subsets of the data.
that fit the data equally well, many machine learning algorithms have a way of mathematically preferring the simpler of the two. The folk wisdom here is that a simpler model will perform better on out-of-sample testing data, because it has less parameters
to fit, and thus is less likely to be overfit
One should not take this rule too far. There are many places in machine learning where additional complexity can benefit performance. On top of that, it is not quite accurate to say that model complexity leads to overfitting. More accurate is that the procedure
used to fit all that complexity leads to overfitting if it is not very clever. But there are plenty of cases where the complexity is brought to heel by cleverness in the model fitting process.
Thus, prefer simple models because they are smaller, faster to fit, and more interpretable, but not necessarily because they will lead to better performance; the only way to know that is to evaluate your model on
test data.
也不能过于轻信这个原则。也有非常多地方格外的复杂度会带来额外的收益。
太复杂的模型带来overfitting,这样的说法并不准确。有时额外的复杂度是模型训练中有意而且聪明的选择(复杂的structure也许更好契合了问题,效果和简单模型一样。也许仅仅是数据还不够)。
因此,倾向于简单模型由于他们更小。更好训练。更easy解释,但并不一定由于他们会带来更好的效果。
仅仅有实际測试可以告诉你答案。
8. Representable Does Not Imply Learnable
--可表示不代表可学习
are fond of saying that the function representing an accurate prediction on your datais representable by
the learning algorithm. This means that it is possiblefor
the algorithm to build a good model on your data.
by itself. Building a good model may require much more data than you have, or the good model might simply never be found by the algorithm. Just because there’s a good model out there that the algorithm could find
does not mean that it willfind
it.
If the algorithm can’t find a good model, but you are pretty sure that a good model exists, try engineering features that will make that model a little more obvious to the algorithm.
observational data can only show us that two variables are related, but it cannot
tell us the “why”.
可是书不是成绩好的原因,你不能给那些孩子送书就提升他们的成绩。真正的原因可能是。书籍多的家庭父母的教育程度高,对还自己的教育也相对较好。书不过一个indicators
your models. Just because one thing predicts another doesn’t mean it causes another, and making business (or
public policy) decisions based on some imagined causal relationship should be done with extreme caution.
Big Picture
and like any powerful tool,misuses of it can cause a lot of damage.
Understanding how machine learning works and some of the potential pitfalls can go a long way towards keeping you out of trouble.
Everything You Wanted to Know About Machine Learning的更多相关文章
- 【Machine Learning】KNN算法虹膜图片识别
K-近邻算法虹膜图片识别实战 作者:白宁超 2017年1月3日18:26:33 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现的深入理解.本系列文章是作者结 ...
- 【Machine Learning】Python开发工具:Anaconda+Sublime
Python开发工具:Anaconda+Sublime 作者:白宁超 2016年12月23日21:24:51 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现 ...
- 【Machine Learning】机器学习及其基础概念简介
机器学习及其基础概念简介 作者:白宁超 2016年12月23日21:24:51 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现的深入理解.本系列文章是作者结 ...
- 【Machine Learning】决策树案例:基于python的商品购买能力预测系统
决策树在商品购买能力预测案例中的算法实现 作者:白宁超 2016年12月24日22:05:42 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现的深入理解.本 ...
- 【机器学习Machine Learning】资料大全
昨天总结了深度学习的资料,今天把机器学习的资料也总结一下(友情提示:有些网站需要"科学上网"^_^) 推荐几本好书: 1.Pattern Recognition and Machi ...
- [Machine Learning] Active Learning
1. 写在前面 在机器学习(Machine learning)领域,监督学习(Supervised learning).非监督学习(Unsupervised learning)以及半监督学习(Semi ...
- [Machine Learning & Algorithm]CAML机器学习系列2:深入浅出ML之Entropy-Based家族
声明:本博客整理自博友@zhouyong计算广告与机器学习-技术共享平台,尊重原创,欢迎感兴趣的博友查看原文. 写在前面 记得在<Pattern Recognition And Machine ...
- machine learning基础与实践系列
由于研究工作的需要,最近在看机器学习的一些基本的算法.选用的书是周志华的西瓜书--(<机器学习>周志华著)和<机器学习实战>,视频的话在看Coursera上Andrew Ng的 ...
- matlab基础教程——根据Andrew Ng的machine learning整理
matlab基础教程--根据Andrew Ng的machine learning整理 基本运算 算数运算 逻辑运算 格式化输出 小数位全局修改 向量和矩阵运算 矩阵操作 申明一个矩阵或向量 快速建立一 ...
- Machine Learning
Recently, I am studying Maching Learning which is our course. My English is not good but this course ...
随机推荐
- http://www.cutt.com/
简网APP工场-服务介绍 服务介绍
- 第12届北师大校赛热身赛第二场 B起床的烦恼
题目链接:http://www.bnuoj.com/bnuoj/contest_show.php? cid=3570#problem/43572 题目大意: Nono从一開始数数,他每数一个数时会计算 ...
- OnClick事件的Sender参数的前世今生——TWinControl.WinProc优先捕捉到鼠标消息,然后使用IsControlMouseMsg函数进行消息转发给图形子控件(意外发现OnClick是由WM_LBUTTONUP触发的)
这是一个再普通不过的Button1Click执行体: procedure TForm1.Button1Click(Sender: TObject); begin ShowMessage('I am B ...
- 设置MyEclipse中代码的换行长度
1.打开Preferences -> Java -> Code Style -> Formatter. 2.选择Edit -> Line Wrapping -> Max ...
- PHP学习之-数据库操作
PHP学习之-数据库操作 1.PHP支持那些数据库 PHP通过安装相应的扩展来实现数据库操作,现代应用程序的设计离不开数据库的应用,当前主流的数据库有MsSQL,MySQL,Sybase,Db2,Or ...
- 体系结构复习2——指令级并行(分支预測和VLIW)
第五章内容较多,接体系结构复习1 5.4 基于硬件猜測的指令级并行 动态分支预測是在程序运行时.依据转移的历史信息等动态确定预測分支方向.主要方法有: 基于BPB(Branch Prediction ...
- FZOJ2111:Min Number
Problem Description Now you are given one non-negative integer n in 10-base notation, it will only c ...
- leetcode先刷_Path Sum
水的问题不解释,具有参数保持部和,当它到达一个叶子节点,推断是否与给予平等. 需要注意的是节点在树中的数目值它可以是正的或负.它不使用,修剪.有仅仅存在罐.因此,关于或代表最终结果的字. bool h ...
- C中的几组指针
1.二维数组 下面就三种二维数组进行说明. 1: int **Ptr; 2: int *Ptr[ 5 ]; 3: int ( *Ptr )[ 5 ]; 以上三例都是整数的二维数组,都可以用形如 Ptr ...
- 在Mac OS X苹果lion系统上制作USB启动盘
本文翻译自:http://evan.borgstrom.ca/post/1314205955/osx-bootable-usb-from-iso 我也就不按照原文上一句句的翻译了,只说几个比较重要的步 ...