Everything You Wanted to Know About Machine Learning
所以张小龙才说‘我说的都是错的’。
note by 王犇
- A set of possible models to look thorough
- A way to test whether a model is good
- A clever way to find a really good model with only a few test
is to always measure the performance of your classifier on out-of-sample data.
testing splits. You should even make some predictions on data you imagine yourself, to see what the model does in certain situations.
you add more and more input fields, you must also add more and more training data to “fill up” the space created
by the additional inputs if you want to use them accurately.
否则非常可能噪音会让你的模型效果更差。
They Seem
know if an algorithm will model your data well is to try it out.
single input field, or even any single pair of fields, is closely correlated with the objective.
to create fields that make machine learning algorithms work better.
of the project’s time goes into feature engineering, 20% goes towards figuring out what comprises a proper and
comprehensive evaluation of the algorithm, and only 10% goes into algorithm selection and tuning.
两个经纬度和两者间的距离是须要相当复杂的转换工作。
转换后可以和用户是否愿意在同一天在两个城市间开车具有很强的关联性。
good evidence that, in a lot of problems, very simple machine learning techniques can be levered into incredibly
powerful classifiers with the addition of loads of data.
A big reason for this is because, once you’ve defined your input fields, there’s only so much analytic gymnastics you can do. Computer algorithms trying to learn models have only a relatively few tricks they can do efficiently, and many of them are not so very
different. Thus, as we have
said before, performance differences between algorithms are typically not large. Thus, if you want better classifiers, you should spend your time:
- Engineering better features
- Getting your hands on more high-quality data
而且事实上非常多模型的原理也都有相似之处。(想想n多的Learning
2 Rank算法)所以假设你希望达到更好的分类器。你能够优先这么做:
a more powerful model by learning multiple classifiers over different random subsets of the data.
that fit the data equally well, many machine learning algorithms have a way of mathematically preferring the simpler of the two. The folk wisdom here is that a simpler model will perform better on out-of-sample testing data, because it has less parameters
to fit, and thus is less likely to be overfit
One should not take this rule too far. There are many places in machine learning where additional complexity can benefit performance. On top of that, it is not quite accurate to say that model complexity leads to overfitting. More accurate is that the procedure
used to fit all that complexity leads to overfitting if it is not very clever. But there are plenty of cases where the complexity is brought to heel by cleverness in the model fitting process.
Thus, prefer simple models because they are smaller, faster to fit, and more interpretable, but not necessarily because they will lead to better performance; the only way to know that is to evaluate your model on
test data.
也不能过于轻信这个原则。也有非常多地方格外的复杂度会带来额外的收益。
太复杂的模型带来overfitting,这样的说法并不准确。有时额外的复杂度是模型训练中有意而且聪明的选择(复杂的structure也许更好契合了问题,效果和简单模型一样。也许仅仅是数据还不够)。
因此,倾向于简单模型由于他们更小。更好训练。更easy解释,但并不一定由于他们会带来更好的效果。
仅仅有实际測试可以告诉你答案。
8. Representable Does Not Imply Learnable
--可表示不代表可学习
are fond of saying that the function representing an accurate prediction on your datais representable by
the learning algorithm. This means that it is possiblefor
the algorithm to build a good model on your data.
by itself. Building a good model may require much more data than you have, or the good model might simply never be found by the algorithm. Just because there’s a good model out there that the algorithm could find
does not mean that it willfind
it.
If the algorithm can’t find a good model, but you are pretty sure that a good model exists, try engineering features that will make that model a little more obvious to the algorithm.
observational data can only show us that two variables are related, but it cannot
tell us the “why”.
可是书不是成绩好的原因,你不能给那些孩子送书就提升他们的成绩。真正的原因可能是。书籍多的家庭父母的教育程度高,对还自己的教育也相对较好。书不过一个indicators
your models. Just because one thing predicts another doesn’t mean it causes another, and making business (or
public policy) decisions based on some imagined causal relationship should be done with extreme caution.
Big Picture
and like any powerful tool,misuses of it can cause a lot of damage.
Understanding how machine learning works and some of the potential pitfalls can go a long way towards keeping you out of trouble.
Everything You Wanted to Know About Machine Learning的更多相关文章
- 【Machine Learning】KNN算法虹膜图片识别
K-近邻算法虹膜图片识别实战 作者:白宁超 2017年1月3日18:26:33 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现的深入理解.本系列文章是作者结 ...
- 【Machine Learning】Python开发工具:Anaconda+Sublime
Python开发工具:Anaconda+Sublime 作者:白宁超 2016年12月23日21:24:51 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现 ...
- 【Machine Learning】机器学习及其基础概念简介
机器学习及其基础概念简介 作者:白宁超 2016年12月23日21:24:51 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现的深入理解.本系列文章是作者结 ...
- 【Machine Learning】决策树案例:基于python的商品购买能力预测系统
决策树在商品购买能力预测案例中的算法实现 作者:白宁超 2016年12月24日22:05:42 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现的深入理解.本 ...
- 【机器学习Machine Learning】资料大全
昨天总结了深度学习的资料,今天把机器学习的资料也总结一下(友情提示:有些网站需要"科学上网"^_^) 推荐几本好书: 1.Pattern Recognition and Machi ...
- [Machine Learning] Active Learning
1. 写在前面 在机器学习(Machine learning)领域,监督学习(Supervised learning).非监督学习(Unsupervised learning)以及半监督学习(Semi ...
- [Machine Learning & Algorithm]CAML机器学习系列2:深入浅出ML之Entropy-Based家族
声明:本博客整理自博友@zhouyong计算广告与机器学习-技术共享平台,尊重原创,欢迎感兴趣的博友查看原文. 写在前面 记得在<Pattern Recognition And Machine ...
- machine learning基础与实践系列
由于研究工作的需要,最近在看机器学习的一些基本的算法.选用的书是周志华的西瓜书--(<机器学习>周志华著)和<机器学习实战>,视频的话在看Coursera上Andrew Ng的 ...
- matlab基础教程——根据Andrew Ng的machine learning整理
matlab基础教程--根据Andrew Ng的machine learning整理 基本运算 算数运算 逻辑运算 格式化输出 小数位全局修改 向量和矩阵运算 矩阵操作 申明一个矩阵或向量 快速建立一 ...
- Machine Learning
Recently, I am studying Maching Learning which is our course. My English is not good but this course ...
随机推荐
- 常见tcp端口
TCP端口 7 = 回显 9 = 丢弃 11 = 在线用户 13 = 时间服务 15 = 网络状态 17 = 每日引用 18 = 消息发送 19 = 字符发生器 20 = ftp数据 21 = 文件传 ...
- salon_百度百科
salon_百度百科 salon 编辑 是法语Salon一字的译音,中文意即客厅,原指法国上层人物住宅中的豪华会客厅.从十七世纪,巴黎的名人(多半是名媛贵妇)常把客厅变成著名的社交 ...
- Oracle闪回flashback总结
1.说明: Ø 采用的技术. 使用的是多个技术. 1. 闪回日志 2. 回收站 3. 回滚段 无法使用回收站的操作 Drop table xxx purge; Drop ...
- psl/sql本地与远程连接配置
一:下载Oracleclient 下载地址:http://www.oracle.com/technetwork/database/features/instant-client/index-09748 ...
- SpringMVC经典系列-12基于SpringMVC的文件上传---【LinusZhu】
注意:此文章是个人原创.希望有转载须要的朋友们标明文章出处,假设各位朋友们认为写的还好,就给个赞哈,你的鼓舞是我创作的最大动力.LinusZhu在此表示十分感谢,当然文章中如有纰漏.请联系linusz ...
- C#递归复制文件夹
/// <param name="sources">原路徑</param> /// <param name="dest">目 ...
- HTTPS的学习
HTTPS的学习总结 HTTPS学习总结 简述 HTTPS对比HTTP就多了一个安全层SSL/TLS,具体就是验证服务端的证书和对内容进行加密. 先来看看HTTP和HTTPS的区别 我用AFN访问 ...
- shell的特殊符号的表示
shell中存在一些特殊的符号.这些符号可以帮助我们更好的写出shell来 1.特殊字符 符号 使用 输出 , 枚举分隔符 . 当前目 ...
- cocos2d-x游戏开发系列教程-编译运行我们的第一个cocos2d-x游戏程序
环境准备和介绍: 操作系统:64位Windows 7 sp1(Microsoft Windows [版本 6.1.7601]) 必要的软件和源码: visual_studio_ultimate_201 ...
- 积累的VC编程小技巧之标题栏和菜单
1.窗口最大最小化按纽的控制 ①怎样在程序开始的时候让它最大化? ②vc++做出来的exe文件在窗体的右上方是没有最大化和最小化按钮的,怎样实现这一功能? ③如何在显示窗口时,使最大化按钮变灰? ...