Google's Machine Learning Crash Course #01# Introducing ML & Framing & Fundamental terminology

INDEX

Introducing ML
Framing
Fundamental machine learning terminology

Introducing ML

What you learn here will allow you, as a software engineer, to do three things better.

First, it gives you a tool to reduce the time you spend programming.
Second, it will allow you to customize your products, making them better for specific groups of people.
Third, machine learning lets you solve problems that you, as a programmer, have no idea how to do by hand.

Now, besides these three practical reasons for mastering machine learning, there's a philosophical reason: machine learning changes the way you think about a problem.

we use assertions to prove properties of our program are correct. With machine learning, the focus shifts from a mathematical science to a natural science: we're making observations about an uncertain world, running experiments, and using statistics, not logic, to analyze the results of the experiment. The ability to think like a scientist will expand your horizons and open up new areas that you couldn't explore without it.

Framing

Hi, my name is D. Sculley.
I'm one of the people who is coming to you from Google in order to present this Machine Learning Crash Course with TensorFlow APIs.
Now, before we dive in, let's take a second to remind ourselves of the basic framework that we are talking about in this class.
And that basic framework is supervised machine learning.
In supervised machine learning, we are learning to create models that combine inputs, to produce useful predictions even on previously unseen data.
Now, where we're training that model, we're providing it with labels.
And in the case of, say, email spam filtering, that label might be something like 'spam or not spam'.
It's the target that we're trying to predict.
The features are the way that we represent our data.
So features might be drawn from an email as, say, words in the email or "to and from addresses", various pieces of routing or header information, any piece of information that we might extract from that email. to represent it for our machine learning system.
An example, is one piece of data.
For example, one email.
Now that could be a labeled example in which we have both feature information, represented in that email, and the label value, of 'spam or not spam'.
Maybe that's come from a user who has provided that to us.
Or we could have an unlabeled example, such as a piece of email for which we have feature information, but we don't yet know whether it is spam or not spam.
And likely what we are going to do is classify that to put it in the user's inbox or spam folder.
Finally, we have a model and that model is the thing that is doing the predicting.
It's something that we're going to try and create through a process of learning from data.

Fundamental machine learning terminology

Labels

label 是我们所预测的东西。例如，在垃圾邮件过滤系统中，label 是类似于“垃圾邮件或非垃圾邮件”的东西。

用 y 来表示它。

Features

feature 是输入的变量。在垃圾邮件过滤系统中，feature 可以包括：邮件文本中的单词、发送者地址、发送时间等。

一个简单的机器学习系统可能只用一个 feature，但是对于复杂的机器学习系统而言，可能用到上百万个 feature 。

Examples

example 是数据的特定实例。它分为两种：labeled examples 和 unlabeled examples 。

在垃圾邮件过滤系统中，labeled example 就是“已经被确切标记为垃圾或者非垃圾”的邮件，unlabeled examples 则是“没有被确切标记为垃圾或者非垃圾”的邮件。

我们通过 labeled examples 训练模型，再通过模型来分类尚未标记的邮件。

Models

model 定义了 feature 和 label 之间的关系。在垃圾邮件过滤系统中，特定的 feature 可能和垃圾邮件关联起来。

model 生命周期的两个阶段：

训练（英：Training 或 learning the model）。通俗而言，就是把已经分类好的 labeled example 呈现给模型，使得模型渐渐学会 feature 和 label 之间的关系。
推断意味用已经训练好的模型来预测未分类的 unlabeled example 。

Regression vs. classification

regression 模型预测连续的值。例如：明天气温多少度？北京的房价是多少？

classification 模型预测离散的值。例如：图片上是猫还是狗？垃圾邮件或者非垃圾邮件？