Everything You Wanted to Know About Machine Learning
所以张小龙才说‘我说的都是错的’。
note by 王犇
- A set of possible models to look thorough
- A way to test whether a model is good
- A clever way to find a really good model with only a few test
is to always measure the performance of your classifier on out-of-sample data.
testing splits. You should even make some predictions on data you imagine yourself, to see what the model does in certain situations.
you add more and more input fields, you must also add more and more training data to “fill up” the space created
by the additional inputs if you want to use them accurately.
否则非常可能噪音会让你的模型效果更差。
They Seem
know if an algorithm will model your data well is to try it out.
single input field, or even any single pair of fields, is closely correlated with the objective.
to create fields that make machine learning algorithms work better.
of the project’s time goes into feature engineering, 20% goes towards figuring out what comprises a proper and
comprehensive evaluation of the algorithm, and only 10% goes into algorithm selection and tuning.
两个经纬度和两者间的距离是须要相当复杂的转换工作。
转换后可以和用户是否愿意在同一天在两个城市间开车具有很强的关联性。
good evidence that, in a lot of problems, very simple machine learning techniques can be levered into incredibly
powerful classifiers with the addition of loads of data.
A big reason for this is because, once you’ve defined your input fields, there’s only so much analytic gymnastics you can do. Computer algorithms trying to learn models have only a relatively few tricks they can do efficiently, and many of them are not so very
different. Thus, as we have
said before, performance differences between algorithms are typically not large. Thus, if you want better classifiers, you should spend your time:
- Engineering better features
- Getting your hands on more high-quality data
而且事实上非常多模型的原理也都有相似之处。(想想n多的Learning
2 Rank算法)所以假设你希望达到更好的分类器。你能够优先这么做:
a more powerful model by learning multiple classifiers over different random subsets of the data.
that fit the data equally well, many machine learning algorithms have a way of mathematically preferring the simpler of the two. The folk wisdom here is that a simpler model will perform better on out-of-sample testing data, because it has less parameters
to fit, and thus is less likely to be overfit
One should not take this rule too far. There are many places in machine learning where additional complexity can benefit performance. On top of that, it is not quite accurate to say that model complexity leads to overfitting. More accurate is that the procedure
used to fit all that complexity leads to overfitting if it is not very clever. But there are plenty of cases where the complexity is brought to heel by cleverness in the model fitting process.
Thus, prefer simple models because they are smaller, faster to fit, and more interpretable, but not necessarily because they will lead to better performance; the only way to know that is to evaluate your model on
test data.
也不能过于轻信这个原则。也有非常多地方格外的复杂度会带来额外的收益。
太复杂的模型带来overfitting,这样的说法并不准确。有时额外的复杂度是模型训练中有意而且聪明的选择(复杂的structure也许更好契合了问题,效果和简单模型一样。也许仅仅是数据还不够)。
因此,倾向于简单模型由于他们更小。更好训练。更easy解释,但并不一定由于他们会带来更好的效果。
仅仅有实际測试可以告诉你答案。
8. Representable Does Not Imply Learnable
--可表示不代表可学习
are fond of saying that the function representing an accurate prediction on your datais representable by
the learning algorithm. This means that it is possiblefor
the algorithm to build a good model on your data.
by itself. Building a good model may require much more data than you have, or the good model might simply never be found by the algorithm. Just because there’s a good model out there that the algorithm could find
does not mean that it willfind
it.
If the algorithm can’t find a good model, but you are pretty sure that a good model exists, try engineering features that will make that model a little more obvious to the algorithm.
observational data can only show us that two variables are related, but it cannot
tell us the “why”.
可是书不是成绩好的原因,你不能给那些孩子送书就提升他们的成绩。真正的原因可能是。书籍多的家庭父母的教育程度高,对还自己的教育也相对较好。书不过一个indicators
your models. Just because one thing predicts another doesn’t mean it causes another, and making business (or
public policy) decisions based on some imagined causal relationship should be done with extreme caution.
Big Picture
and like any powerful tool,misuses of it can cause a lot of damage.
Understanding how machine learning works and some of the potential pitfalls can go a long way towards keeping you out of trouble.
Everything You Wanted to Know About Machine Learning的更多相关文章
- 【Machine Learning】KNN算法虹膜图片识别
K-近邻算法虹膜图片识别实战 作者:白宁超 2017年1月3日18:26:33 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现的深入理解.本系列文章是作者结 ...
- 【Machine Learning】Python开发工具:Anaconda+Sublime
Python开发工具:Anaconda+Sublime 作者:白宁超 2016年12月23日21:24:51 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现 ...
- 【Machine Learning】机器学习及其基础概念简介
机器学习及其基础概念简介 作者:白宁超 2016年12月23日21:24:51 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现的深入理解.本系列文章是作者结 ...
- 【Machine Learning】决策树案例:基于python的商品购买能力预测系统
决策树在商品购买能力预测案例中的算法实现 作者:白宁超 2016年12月24日22:05:42 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现的深入理解.本 ...
- 【机器学习Machine Learning】资料大全
昨天总结了深度学习的资料,今天把机器学习的资料也总结一下(友情提示:有些网站需要"科学上网"^_^) 推荐几本好书: 1.Pattern Recognition and Machi ...
- [Machine Learning] Active Learning
1. 写在前面 在机器学习(Machine learning)领域,监督学习(Supervised learning).非监督学习(Unsupervised learning)以及半监督学习(Semi ...
- [Machine Learning & Algorithm]CAML机器学习系列2:深入浅出ML之Entropy-Based家族
声明:本博客整理自博友@zhouyong计算广告与机器学习-技术共享平台,尊重原创,欢迎感兴趣的博友查看原文. 写在前面 记得在<Pattern Recognition And Machine ...
- machine learning基础与实践系列
由于研究工作的需要,最近在看机器学习的一些基本的算法.选用的书是周志华的西瓜书--(<机器学习>周志华著)和<机器学习实战>,视频的话在看Coursera上Andrew Ng的 ...
- matlab基础教程——根据Andrew Ng的machine learning整理
matlab基础教程--根据Andrew Ng的machine learning整理 基本运算 算数运算 逻辑运算 格式化输出 小数位全局修改 向量和矩阵运算 矩阵操作 申明一个矩阵或向量 快速建立一 ...
- Machine Learning
Recently, I am studying Maching Learning which is our course. My English is not good but this course ...
随机推荐
- iOS开发--in house发布和安装(ipa重新签名)
in house从字面意思理解就是‘内部的’,in house版本的ipa就是一个用于公司内部使用或测试的一个苹果应用程序安装包. 作为一个app应用程序开发者,在app应用程序在苹果商店上架前总需要 ...
- boost 循环缓冲区
boost 循环缓冲区 #include <boost/circular_buffer.hpp> int _tmain(int argc, _TCHAR* argv[]) { boost: ...
- windows下安装mysql5.6.13的主从复制
如下操作均在vmware 虚拟机中winows xp 测试成功 中间走了很多弯路,网上的很多资料都是针对5.1以前的版本,在新版中根本无法使用,所以根据自己的实践整理了这篇文章 主服务:192.168 ...
- IMAP和POP3有什么差别?
servCode=6010376">POP3协议同意电子邮件client下载server上的邮件,可是在client的操作(如移动邮件.标记已读等),不会反馈到server上.比方通过 ...
- java.lang.ClassNotFoundException: org.apache.catalina.loader.DevLoader
eclipse tomcat报错:org.apache.catalina.loader.DevLoader java.lang.ClassNotFoundException: org.apache.c ...
- ASP.NET Core 中文文档
ASP.NET Core 中文文档 翻译计划 五月中旬 .NET Core RC2 如期发布,我们遂决定翻译 ASP.NET Core 文档.我们在 何镇汐先生. 悲梦先生. 张仁建先生和 雷欧纳德先 ...
- python web with bottle and session (beaker)
python web with bottle and session (beaker) http://icodesnip.com/snippet/python/python-web-with-bott ...
- uvc摄像头代码解析6
10.扫描视频设备链和注册视频设备 10.1 uvc视频链 struct uvc_video_chain { //uvc视频链 struct uvc_device *dev; //uvc设备 stru ...
- 研究一下FBrush,它是从TWinControl才有的属性(可能是因为需要句柄)——发现{$R *.dfm}在运行期执行,而且很有深意,读到属性后赋值还会触发事件,这些无法在VCL代码里直接看到
定义和创建: TWinControl = class(TControl) private FBrush: TBrush; end; constructor TWinControl.Create(AOw ...
- 浅析java的浅拷贝和深拷贝
Java中任何实现了Cloneable接口的类都可以通过调用clone()方法来复制一份自身然后传给调用者.一般而言,clone()方法满足: (1) 对任何的对象x,都有x.clone( ...