Introduction to Machine Learning
Chapter 1 Introduction
1.1 What Is Machine Learning?
To solve a problem on a computer, we need an algorithm. An algorithm is a sequence of instructions that should be carried out to transform the input to output. For example, one can devise an algorithm for sorting. The input is a set of numbers and the output is their ordered list. For the same task, there may be various algorithms and we may be interested in finding the most efficient one, requiring the least number of instructions or memory or both.
For some tasks, however, we do not have an algorithm - for example, to tell spam emails from legitimate email. We know what the input is: an email document that in the simplest case is a file of characters. We know what the output should be: a yes/no output indicating whether the message is spam or not. We do not know how to transform the input to the output. What can be considered spam changes in time and from individual to individual.
What we lack in knowledge, we make up for in data. We can easily compile thousands of example messages some of which we know to be spam and what we want is to “learn” what constitutes spam from them. In other words, we would like the computer (machine) to extract automatically the algorithm for this task. There is no need to learn to sort numbers, we already have algorithms for that; but there are many applications for which we do not have an algorithm but do have example data.
With advances in computer technology, we currently have the ability to store and process large amounts of data, as well as to access it from physically distant locations over a computer network. Most data acquisition devices are digital now and record reliable data. Think, for example, of a supermarket chain that has hundreds of stores all over a country selling thousands of goods to millions of customers. The point of sale terminals record the details of each transactions: date, customer identification code, goods bought and their amount, total money spent, and so forth. This typically amounts to gigabytes of data every day. What the supermarket chain wants it to be able to predict who are the likely customers for a product. Again, the algorithm for this is not evident; it changes in time and by geographic location. The stored data becomes useful only when it is analyzed and turned into information that we can make use of, for example, to make predictions.
We may not be able to identify the process completely, but we believe we can construct a good and useful approximation. That approximation may not explain everything, but may still be able to account for some part of the data. We believe that thought identifying the complete process may not be possible, we can still detect certain patterns or regularities. This is the niche of machine learning. Such patterns may help us understand the process, or we can use those patterns to make predictions: Assuming that the future, at least the near future, will not be much different from the past when the sample data was collected, the future predictions can also be expected to be right.
Application of machine learning methods to large databases is called data mining. The analogy is that a large volume of each and raw material is extracted from a mine, which when processed leads to a small amount of very precious material; similarly, in data mining, a large volume of data is processed to construct a simple model with valuable use, for example, having high predictive accuracy. Its application areas are abundant: In addition to retail, in finance banks analyze their past data to build models to use in credit applications, fraud detection, and the stock market.
1.2.5 Reinforcement Learning
In some applications, the output of the system is a sequence of action. In such a case, a single action is not important; what is important is the policy that is the sequence of correct actions to reach the goal. There is no such thing as the best action in any intermediate state; an action is good if it is part of a good policy. In such a case, the machine learning program should be able to assess the goodness of policies and learn from past good action sequences to be able to generate a policy. Such learning methods are called reinforcement learning algorithms.
Chapter 2 Supervised Learning
We discuss supervised learning starting from the simplest case, which is learning a class from its positive and negative examples. We generalize and discuss the case of multiple classes, then regression, where the outputs are continuous.
2.1 Learning a Class from Examples
Let us say we want to learn the class, C, of a “family car”. We have a set of examples of cars, and we have a group of people that we survey to whom we show these cars. The people look at the cars and label them; the cars that they believe are family cars are positive examples, and the other cars are negative examples. Class learning is finding a description that is shared by all positive examples. Class learning is finding a description that is shared by all positive examples and none of the negative examples. Doing this, we can make a prediction: Given a car that we have not seen before, by checking with the description learned, we will be able to say whether it is a family car or not. Or we can do knowledge extraction: This study may be sponsored by a car company, and the aim may be to understand what people expect from a family car.
Chapter 3 Bayesian Decision Theory
We discuss probability theory as the framework for making decisions under uncertainty. In classification, Bayes' rule is used to calculate the probabilities of the classes. We generalize to discuss how we can make rational decisions among multiple actions to minimize expected risk. We also discuss learning association rules from data.
3.1 Introduction
Programming computers to make inference from data is a cross between statistics and computer science, where statisticians provide the mathematical framework of making inference from data and computer scientists work on the efficient implementation of the inference methods.
Data comes from a process that is not completely known. This lack of knowledge is indicated by modeling the process as a random process. Maybe the process is actually deterministic, but because we do not have access to complete knowledge about it, we model it as random and use probability theory to analyze it. At this point, it may be a good idea to jump the appendix and review basic probability theory before continuing with this chapter.
Chapter 4 Parametric Methods
Having discussed how to make optimal decisions when the uncertainty is modeled using probabilities, we now see how we can estimate these probabilities from a given training set. We start with the parametric approach for classification and regression. We discuss the semiparametric and non parametric approaches in later chapters. We introduce bias/variance dilemma and model selection methods for trading off model complexity and empirical error.
4.1 Introduction
A statistic is any value that is calculated from a given sample. In statistical inference, we make a decision using the information provided by a sample.
Introduction to Machine Learning的更多相关文章
- ML Lecture 0-1: Introduction of Machine Learning
本博客是针对李宏毅教授在Youtube上上传的课程视频<ML Lecture 0-1: Introduction of Machine Learning>的学习笔记.在Github上也po ...
- 【Machine Learning is Fun!】1.The world’s easiest introduction to Machine Learning
Bigger update: The content of this article is now available as a full-length video course that walks ...
- Introduction of Machine Learning
李宏毅主页 台湾大学语音处理实验室 人工智慧.机器学习与深度学习间有什么区别? 人工智能——目标 机器学习——手段 深度学习——机器学习的一种方法 人类设定好的天生本能 Machine Learnin ...
- 李宏毅老师机器学习课程笔记_ML Lecture 0-1: Introduction of Machine Learning
引言: 最近开始学习"机器学习",早就听说祖国宝岛的李宏毅老师的大名,一直没有时间看他的系列课程.今天听了一课,感觉非常棒,通俗易懂,而又能够抓住重点,中间还能加上一些很有趣的例子 ...
- Introduction To Machine Learning Self-Evaluation Test
Preface Section 1 - Mathematical background Multivariate calculus take derivatives and integrals; de ...
- Machine Learning Algorithms Study Notes(1)--Introduction
Machine Learning Algorithms Study Notes 高雪松 @雪松Cedro Microsoft MVP 目 录 1 Introduction 1 1.1 ...
- 【机器学习Machine Learning】资料大全
昨天总结了深度学习的资料,今天把机器学习的资料也总结一下(友情提示:有些网站需要"科学上网"^_^) 推荐几本好书: 1.Pattern Recognition and Machi ...
- Machine Learning Algorithms Study Notes(6)—遗忘的数学知识
机器学习中遗忘的数学知识 最大似然估计( Maximum likelihood ) 最大似然估计,也称为最大概似估计,是一种统计方法,它用来求一个样本集的相关概率密度函数的参数.这个方法最早是遗传学家 ...
- [Python & Machine Learning] 学习笔记之scikit-learn机器学习库
1. scikit-learn介绍 scikit-learn是Python的一个开源机器学习模块,它建立在NumPy,SciPy和matplotlib模块之上.值得一提的是,scikit-learn最 ...
随机推荐
- hiho_1089_floyd最短路
题目 floyd算法求所有顶点之间的最短路,典型的模板题.唯一需要注意的是两个顶点之间可能有多条边直接相连,在初始化的时候,直接选择最小的长度作为两点间的距离即可. 实现 #include<io ...
- hiho_1041 国庆出游
题目 给定一棵树,N个节点,N - 1条边.给定m个节点,能否找出一种遍历方法,使得首次到达节点ai的时间小于首次到达节点aj的时间(i < j).且经过的路径上的每条边都最多走两遍 分析 我的 ...
- Python策略模式实现源码分享
1.让一个对象的某个方法可以随时改变,而不用更改对象的代码 2.对于动态类型的Python语言,不需要定义接口 3.基本的实现方法:用类作为参数传递 例如: 12_eg3.py class Movea ...
- 游戏引擎/GUI的设计与实现-序
几年前写<嵌入式GUI FTK设计与实现>,没写几篇就停止更新了.当时自己研究过MicroWindows, X Window, DirectFB, GTK+和Android的GUI,又写过 ...
- noip2016赛后总结
面前并不是一颗变质的心. 只是一种综合并适应一切的情怀. 这或许是最好的心态... 今年的noip貌似考得好不理想呢...彻底挂了... gjs580,hgr555. 一个初三学弟570,一个400+ ...
- CentOS安装vim
VMware下CentOS安装成功后,默认自带vi,但vi功能没vim丰富.以下为CentOS中安装vim: 用yum产看源中的vim安装包: [xi@localhost ~]$ yum search ...
- 四位专家分享:18个网站SEO建议
搜索引擎优化(简称SEO)对于互联网新创企业来说很重要.下面是四位相关专家给出的建议. 第一位专家是Autotrader公司的搜索市场经理Dewi Nawasari,她认为SEO就是优化网站,以吸引你 ...
- 《javascript高级程序设计》第六章 Object Creation VS Inheritance
6.1 理解对象 6.1.1 属性类型 6.1.2 定义多个属性 6.1.3 读取属性的特性6.2 创建对象 6.2.1 工厂模式 6.2.2 构造函数模式 6.2.3 原型模式 6.2.4 组合使用 ...
- python中join的用法
str.join(sequence) # 将序列中的元素以str字符连接生成一个新的字符串 list1 = ['a', 'b', 'c'] new_str = '-'.join(list1) # 输出 ...
- mint上部署lamp环境
不得不说现在在linux mint上部署lamp很方便,比windows服务器上的asp.net的部署升级都简单. 1 安装MySql sudo apt-get install mysql-serve ...