/ 20220404 Week 1 - 2 /

Chapter 1 - Introduction

1.1 Definition

Arthur Samuel

The field of study that gives computers the ability to learn without being explicitly programmed.
Tom Mitchell

A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.

1.2 Concepts

1.2.1 Classification of Machine Learning

Supervised Learning 监督学习：given a labeled data set; already know what a correct output/result should look like
- Regression 回归：continuous output
- Classification 分类：discrete output
Unsupervised Learning 无监督学习：given an unlabeled data set or an data set with the same labels; group the data by ourselves
- Clustering 聚类：group the data into different clusters
- Non-Clustering 非聚类
Others: Reinforcement Learning, Recommender Systems...

1.2.2 Model Representation

Training Set 训练集

\[\begin{matrix}
x^{(1)}_1&x^{(1)}_2&\cdots&x^{(1)}_n&&y^{(1)}\\
x^{(2)}_1&x^{(2)}_2&\cdots&x^{(2)}_n&&y^{(2)}\\
\vdots&\vdots&\ddots&\vdots&&\vdots\\
x^{(m)}_1&x^{(m)}_2&\cdots&x^{(m)}_n&&y^{(m)}
\end{matrix}\]
符号说明

$m=$ the number of training examples 训练样本的数量 - 行数

$n=$ the number of features 特征数量 - 列数

$x=$ input variable/feature 输入变量/特征

$y=$ output variable/target variable 输出变量/目标变量

$(x^{(i)}_j,y^{(i)})$ ：第$j$个特征的第 $i$ 个训练样本，其中 $i=1, ..., m$，$j=1, ..., n$

1.2.3 Cost Function 代价函数

1.2.4 Gradient Descent 梯度下降

Chapter 2 - Linear Regression 线性回归

\[\begin{matrix}
x_0&x^{(1)}_1&x^{(1)}_2&\cdots&x^{(1)}_n&&y^{(1)}\\
x_0&x^{(2)}_1&x^{(2)}_2&\cdots&x^{(2)}_n&&y^{(2)}\\
\vdots&\vdots&\vdots&\ddots&\vdots&&\vdots\\
x_0&x^{(m)}_1&x^{(m)}_2&\cdots&x^{(m)}_n&&y^{(m)}\\
\\
\theta_0&\theta_1&\theta_2&\cdots&\theta_n&&
\end{matrix}\]

2.1 Linear Regression with One Variable 单元线性回归

Hypothesis Function

\[h_{\theta}(x)=\theta_0+\theta_1x
\]
Cost Function - Square Error Cost Function 平方误差代价函数

\[J(\theta_0,\theta_1)=\frac{1}{2m}\displaystyle\sum_{i=1}^m(h_{\theta}(x^{(i)})-y^{(i)})^2
\]

Goal

\[\min_{(\theta_0,\theta_1)}J(\theta_0,\theta_1)
\]

2.2 Multivariate Linear Regression 多元线性回归

Hypothesis Function

\[\theta=
\left[
\begin{matrix}
\theta_0\\
\theta_1\\
\vdots\\
\theta_n
\end{matrix}
\right],\
x=
\left[
\begin{matrix}
x_0\\
x_1\\
\vdots\\
x_n
\end{matrix}
\right]\]

\[\begin{aligned}h_\theta(x)&=\theta_0+\theta_1x_1+\theta_2x_2+\cdots+\theta_nx_n\\
&=\theta^Tx
\end{aligned}\]
Cost Function

\[J(\theta^T)=\frac{1}{2m}\displaystyle\sum_{i=1}^m(h_{\theta}(x^{(i)})-y^{(i)})^2
\]
Goal

\[\min_{\theta^T}J(\theta^T)
\]

2.3 Algorithm Optimization

2.3.1 Gradient Descent 梯度下降法

算法过程

Repeat until convergence(simultaneous update for each $j=1, ..., n$)

\[\begin{aligned}
\theta_j
&:=\theta_j-\alpha{\partial\over\partial\theta_j}J(\theta^T)\\
&:=\theta_j-\alpha{1\over{m}}\displaystyle\sum_{i=1}^m(h_{\theta}(x^{(i)})-y^{(i)})x^{(i)}_j
\end{aligned}\]

Feature Scaling 特征缩放

对每个特征 $x_j$ 有$$x_j={{x_j-\mu_j}\over{s_j}}$$

其中 $\mu_j$ 为 $m$ 个特征 $x_j$ 的平均值，$s_j$ 为 $m$ 个特征 $x_j$ 的范围（最大值与最小值之差）或标准差。
Learning Rate 学习率

2.3.2 Normal Equation(s) 正规方程（组）

令

\[X=\left[
\begin{matrix}
x_0&x^{(1)}_1&x^{(1)}_2&\cdots&x^{(1)}_n\\
x_0&x^{(2)}_1&x^{(2)}_2&\cdots&x^{(2)}_n\\
\vdots&\vdots&\vdots&\ddots&\vdots\\
x_0&x^{(m)}_1&x^{(m)}_2&\cdots&x^{(m)}_n\\
\end{matrix}
\right],\
y=\left[
\begin{matrix}
y^{(1)}\\
y^{(2)}\\
\vdots\\
y^{(m)}\\
\end{matrix}
\right]\]

其中 $X$ 为 $m\times(n+1)$ 维矩阵，$y$ 为 $m$ 维的列向量。则

\[\theta=(X^TX)^{-1}X^Ty
\]

如果 $X^TX$ 不可逆（noninvertible），可能是因为：

Redundant features 冗余特征：存在线性相关的两个特征，需要删除其中一个；
特征过多，如 $m\leq n$：需要删除一些特征，或对其进行正规化（regularization）处理。

2.4 Polynomial Regression 多项式回归

If a linear $h_\theta(x)$ can't fit the data well, we can change the behavior or curve of $h_\theta(x)$ by making it a quadratic, cubic or square root function(or any other form).

e.g.

$h_{\theta}(x)=\theta_0+\theta_1x_1+\theta_2x_1^2,\ x_2=x_1^2$
$h_{\theta}(x)=\theta_0+\theta_1x_1+\theta_2x_1^2+\theta_3x_1^3,\ x_2=x_1^2,\ x_3=x_1^3$
$h_{\theta}(x)=\theta_0+\theta_1x_1+\theta_2\sqrt{x_1},\ x_2=\sqrt{x_1}$

Coursera 学习笔记｜Machine Learning by Standford University - 吴恩达的更多相关文章

Github | 吴恩达新书《Machine Learning Yearning》完整中文版开源
最近开源了周志华老师的西瓜书<机器学习>纯手推笔记: 博士笔记 | 周志华<机器学习>手推笔记第一章思维导图 [博士笔记 | 周志华<机器学习>手推笔记第二章&qu ...
吴恩达课后作业学习1-week4-homework-two-hidden-layer -1
参考:https://blog.csdn.net/u013733326/article/details/79767169 希望大家直接到上面的网址去查看代码,下面是本人的笔记两层神经网络,和吴恩达课 ...
Coursera课程《Machine Learning》学习笔记（week1）
这是Coursera上比较火的一门机器学习课程,主讲教师为Andrew Ng.在自己看神经网络的过程中也的确发现自己有基础不牢.一些基本概念没搞清楚的问题,因此想借这门课程来个查漏补缺.目前的计划是先 ...
Coursera课程《Machine Learning》吴恩达课堂笔记
强烈安利吴恩达老师的<Machine Learning>课程,讲得非常好懂,基本上算是无基础就可以学习的课程. 课程地址强烈建议在线学习,而不是把视频下载下来看.视频中间可能会有一些问题 ...
【Deeplearning.ai 】吴恩达深度学习笔记及课后作业目录
吴恩达深度学习课程的课堂笔记以及课后作业代码下载:https://github.com/douzujun/Deep-Learning-Coursera 吴恩达推荐笔记:https://mp.weix ...
我在 B 站学机器学习（Machine Learning）- 吴恩达（Andrew Ng）【中英双语】
我在 B 站学机器学习(Machine Learning)- 吴恩达(Andrew Ng)[中英双语] 视频地址:https://www.bilibili.com/video/av9912938/ t ...
吴恩达deepLearning.ai循环神经网络RNN学习笔记_看图就懂了！！！(理论篇)
前言目录: RNN提出的背景 - 一个问题 - 为什么不用标准神经网络 - RNN模型怎么解决这个问题 - RNN模型适用的数据特征 - RNN几种类型 RNN模型结构 - RNN block - ...
吴恩达deepLearning.ai循环神经网络RNN学习笔记_没有复杂数学公式，看图就懂了！！！(理论篇)
本篇文章被Google中国社区组织人转发,评价: 条理清晰,写的很详细! 被阿里算法工程师点在看! 所以很值得一看! 前言目录: RNN提出的背景 - 一个问题 - 为什么不用标准神经网络 - RN ...
吴恩达(Andrew Ng)——机器学习笔记1
之前经学长推荐,开始在B站上看Andrew Ng的机器学习课程.其实已经看了1/3了吧,今天把学习笔记补上吧. 吴恩达老师的Machine learning课程共有113节(B站上的版本https:/ ...

随机推荐

小程序WXS 模块
WXS(WeiXin Script)是小程序的一套脚本语言,结合 WXML,可以构建出页面的结构 WXS中定义的函数可以在wxml文件中使用,可以用它来当过滤器使用 WXS以.wxs扩展名结尾,文件中 ...
LGP5363题解
感觉博弈题都是高大上神秘结论... 感谢@KaiSuoShuTong 开锁疏通愿意教我这题的博弈部分/qq 考虑每次移动棋子,实际上是有一车 $a_i$,每次操作相当于令 \(a_i-c,a_{i ...
CPU是海王？聊聊主/子线程和同/异步的关系
最近表弟一直在找实习,经常会问我一些问题,有些问题在没有经历过真实工作时是真的不好理解的,所以我开了这个[表弟专栏],专门为找工作的表弟解决一些疑惑. 这篇文章从计算机发展的角度出发,描述为什么计算机 ...
Seata XA 模式示例分析
@ 目录 1 下载示例 2 示例结构 3 业务服务 business-xa 3.1 模块结构 3.2 Controller 层 3.3 Service 层 3.4 stock Feign 客户端 3. ...
idea创建web项目以及配置Tomcat
废话不多说,直接上干活: 1.在project中现创建好module,也就是java web项目 2.把路径名写清楚就行了 3.创建在WEB-INF上右击创建classes和lib以存储class编译 ...
4月27日 python学习总结 GIL、进程池、线程池、同步、异步、阻塞、非阻塞
一.GIL:全局解释器锁 1 .GIL:全局解释器锁 GIL本质就是一把互斥锁,是夹在解释器身上的, 同一个进程内的所有线程都需要先抢到GIL锁,才能执行解释器代码 2.GIL的优缺点: 优点: 保 ...
pthread_once函数
http://blog.csdn.net/lmh12506/article/details/8452659 pthread_once()函数详解在多线程环境中,有些事仅需要执行一次.通常当初始化应用 ...
vue&uniapp环境搭建以及项目创建（webstorm）
以下是针对webstorm用户上手uniapp框架的学习 vue环境搭建以及配置(脚手架搭建) 首先要明确三样东西 npm:node.js的包管理器 webpack:主要用途是通过CommonJS 的 ...
Kafka 的设计架构你知道吗？
Producer :消息生产者,就是向 kafka broker 发消息的客户端. Consumer :消息消费者,向 kafka broker 取消息的客户端. Topic :可以理解为一个队列,一 ...
什么是 Future？
在并发编程中,我们经常用到非阻塞的模型,在之前的多线程的三种实现中,不管是继承 thread 类还是实现 runnable 接口,都无法保证获取到之前的执行结果. 通过实现 Callback 接口, ...

Coursera 学习笔记｜Machine Learning by Standford University - 吴恩达