week1

一张图片,设像素为64*64, 颜色通道为红蓝绿三通道,则对应3个64*64实数矩阵

为了用向量表示这些矩阵,将这些矩阵的像素值展开为一个向量x作为算法的输入

从红色到绿色再到蓝色,依次按行一个个将元素读到向量x中,则x是一个\(1\times64*64*3\)的矩阵,也就是一个64*64*3维的向量

用 \(n_x = 64*64*3\) 表示特征向量x的维度

而所有的训练样本表示成:\(X = \begin{bmatrix}\mid & \mid &\mid &&\mid \\ x^{(1)}& x^{(2)}& x^{(3)}& \cdots & x^{(m)}\\ \mid & \mid &\mid &&\mid \end{bmatrix}\) (\(n_x \times m\)矩阵)

注意不是\(X = \begin{bmatrix} (x^{(1)})^T\\ \vdots \\ (x^{(m)})^T \end{bmatrix}\) ,用上面的方法运算会简单点)

\(Y=\begin{bmatrix}y^{(1)} & y^{(2)} & \cdots & y^{(m)}\end{bmatrix}\)

之前的机器学习课上的\(\theta = \begin{bmatrix} \theta_0 \\ \theta_1 \\ \vdots \\ \theta_{n_x} \\ \end{bmatrix}\)的形式不再使用,而用\(\large b = \theta_0, \; w = \begin{bmatrix} \theta_1 \\ \vdots \\ \theta_{n_x} \\ \end{bmatrix}\)代替( it will be easier to just keep \(b\) and \(w\) as separate parameters )

则output : \(\large \hat{y}^{(i)} = \sigma(w^Tx^{(i)}+b),{\rm where\;}\sigma(z^{(i)}) = \frac{1}{1+e^{-z^{(i)}}}\)

\(\text{Given \{}(x^{(1)}, y^{(1)}),\dots,(x^{(m)},y^{(m)})\text{\}, want } \hat{y}^{(i)} \approx y^{(i)}\)


week2

Loss Function/Error Function

Loss Function/Error Function(误差函数): used to measure how well our algorism is doing

\[{\cal L}(\hat{y},y) = -y\cdot log(\hat{y})-(1-y)\cdot log(1-\hat{y})
\]

Cost Function

\[J(w,b) = -\frac{1}{m}[\sum_{i=1}^{m}y^{(i)}\, log\,\hat{y}^{(i)})+(1-y^{(i)})\, log\,(1-\hat{y}^{(i)})]
\]

Gradient Descent

​ 看ML的笔记,实质上是一样的

Vectorization:

#Non-vecotrized
#slow
z = 0
for i in range(n_x):
z += w[i] * x[i]
z += b #Vectorized
#import numpy as np
z = np.dot(w,x) + b

whenever possible, avoid explicit for-loops(因为是解释型语言), 用numpy带的行数可以简洁而高效地实现

Vectorizing Logistic Regression

\(X = \begin{bmatrix} \lvert & \lvert & \cdots & \lvert \\ x^{(1)} & x^{(2)} & \cdots & x^{(m)} \\ \lvert & \lvert & \cdots & \lvert \end{bmatrix}, \mathbb{R}^{n_x \times m}\)

\(Z = \begin{bmatrix}z^{(1)} & z^{(2)} & \cdots & z^{(m)} \end{bmatrix} = w^TX + \begin{bmatrix}b &b & \cdots & b \end{bmatrix}\)

\(z^{(i)}\) 是 sigmoid function的输入值

\(A = \begin{bmatrix}a^{(1)} & a^{(2)} & \cdots & a^{(m)} \end{bmatrix} = \sigma(Z)\)

(这里的不同上标的元素似乎实际是在同一个layer中的,跟ML课上不大一样。 \(a^{[j](i)}\)中方括号括起来的是层数,圆括号括起来的是第\(i\)个训练实例)

import numpy as np
z = np.dot(w,x) + b\
#Python automatically takes this real number b and expands it out to this 1*m row vector

Gradient Output

\({\rm d}z^{(i)} = a^{(i)} - y^{(i)}\)

\(\begin{align}{\rm d}Z &= \begin{bmatrix}{\rm d}z^{(1)} & {\rm d}z^{(2)} & \cdots & {\rm d}z^{(m)} \end{bmatrix} \\&= A-Y = \begin{bmatrix}a^{(1)} - y^{(1)} & a^{(2)} - y^{(2)} & \cdots & a^{(m)} - y^{(m)} \end{bmatrix} \end{align}\)

${\rm d}b = $1/m*np.sum(dZ)

\({\rm d}w = \frac{1}{m}X{\rm d}Z^T\)

单次迭代免for-loop法(vectorize):

\[\begin{align}
\downarrow&\begin{cases}
Z & = w^T+b\\
& = {\rm np.dot(}w{\rm .T, }X{\rm)}\\
A & = \sigma(Z)\\
{\rm d}Z &= A-Y \\
{\rm d}w &= \frac{1}{m}X{\rm d}Z^T\\
\end{cases}\\\\
w& := w - \alpha{\rm d}w\\
b &:= b - \alpha{\rm d}b
\end{align}
\]

若要多次迭代,最外层的显式for-loop是不可避免的

Broadcasting

reshape()确保矩阵的尺寸

举个例子说明numpy 的 broadcasting机制:

>>> import numpy as np
>>> a = np.arange(0,6).reshape(6,1)
>>> a
array([[0],
[1],
[2],
[3],
[4],
[5]])
>>> b = np.arange(0,5)
>>> b
array([0, 1, 2, 3, 4])
>>> a * b
array([[ 0, 0, 0, 0, 0],
[ 0, 1, 2, 3, 4],
[ 0, 2, 4, 6, 8],
[ 0, 3, 6, 9, 12],
[ 0, 4, 8, 12, 16],
[ 0, 5, 10, 15, 20]])
>>> a + b
array([[0, 1, 2, 3, 4],
[1, 2, 3, 4, 5],
[2, 3, 4, 5, 6],
[3, 4, 5, 6, 7],
[4, 5, 6, 7, 8],
[5, 6, 7, 8, 9]])

也就是说matrix+-*/number/vector时,numpy会将number/vector通过自我复制拓展成合法的矩阵

注意这会导致 在期望抛出异常的地方 不抛出异常而是发生奇怪的BUG:

​ 比如 有时我想 行向量和列向量相加时抛出异常, 但是numpy却用broadcasting机制把它给算出来了...

numpy的坑

import numpy as np
a = np.random.randn(5)
>>> a
array([-0.19837642, -0.16758652, 1.57705505, 0.13033745, -0.81073889])
>>> a.shape
(5,)
# which is called a rank 1 array in Python and is neither a row vector nor a column vector >>> a.T
array([-0.19837642, -0.16758652, 1.57705505, 0.13033745, -0.81073889])
# which is same as 'a' i self >>> np.dot(a,a.T)
3.2288264718632416
# it is a number rather than a matrix in expectation(just like array([[55]]))

不要使用形如(5,)或者(n,)这样的“rank 1 array”, 而是显式地说明是\(m \times n\)的矩阵:

>>> a = np.random.randn(5,1)
>>> a
array([[ 0.7643396 ],
[-1.66945103],
[ 1.66235712],
[-0.06892102],
[-1.61347409]])
>>> a.T
array([[ 0.7643396 , -1.66945103, 1.66235712, -0.06892102, -1.61347409]])

注意array([-0.19837642, -0.16758652, 1.57705505, 0.13033745, -0.81073889])array([[ 0.7643396 , -1.66945103, 1.66235712, -0.06892102, -1.61347409]])的区别(后者有两个方括号), 这说明前者是秩为1的数组而后者是一个真正的\(1 \times 5\)矩阵(就像C里一样矩阵是用二维数组表示的)(另外我觉得rank 1 array翻译为一维数组更为准确)

It can use assert() statement to make sure the dimension of one of vectors.

When you get a rank 1 array, you can use a.reshape to transform it into a (n,1) array or a (1,n) array.

Logistic Regression Cost Function

\[\left.
\begin{array}{l}
\text{If y=1:}\quad p(y|x)=\hat{y}\\
\text{If y=0:}\quad p(y|x)=1-\hat{y}
\end{array}
\right\}
p(y|x) = \hat{y}^y\cdot (1-\hat{y})^{1-y}\\
\,\\
\begin{align}
\therefore {\rm log}(p(y|x)) &= y\cdot log\,\hat{y} + (1-y)\cdot log\, (1-\hat{y}) \\
&= -\mathcal{L}(\hat{y},y)
\end{align}
\]

所以:

\[\begin{align}
{\rm log }[p(\text{labels in training set})] &= {\rm log } \prod_{i=1}^mp(y^{(i)}|x^{(i)})\\
&=\sum_{i=1}^m {\rm log\,}p(y^{(i)}|x^{(i)})\\
&=\sum_{i=1}^m-\mathcal{L}(\hat{y}^{(i)},y^{(i)})\\
&=-\sum_{i=1}^m \mathcal{L}(\hat{y}^{(i)},y^{(i)})
\end{align}\\
\text{Cost: }J(w,b) = \frac{1}{m}\sum_{i=1}^m \mathcal{L}(\hat{y}^{(i)},y^{(i)})
\]

maximum likelihood estimation (极大似然估计)


week3

\(Z^{[j]} = W^{[j]}A^{[j-1]} + b^{[j]} = w^{[j]}\begin{bmatrix} | & | & | & \\ a^{[j-1](1)} & a^{[j-1](2)} & a^{[j-1](3)} & \cdots \\ | & | & | & \end{bmatrix} + b^{[j]} = \begin{bmatrix} | & | & | & \\ z^{[j](1)} & z^{[j](2)} & z^{[j](3)} & \cdots \\ | & | & | & \end{bmatrix}\)

其中\((i) \in [(1),(m)],\quad [j] \in [[1],[n]],\quad X = A^{[0]}\)

Other Activation Function

①\(tanh(z)\) function:

\[a= tanh(z)=\frac{e^z -e^{-z}}{e^z +e^{-z}}\text{ , when } tanh(z) \in (-1,1), tanh(0)=0
\]

​ \(tanh(z)\) 可以把 数据中心化 为 0 (Sigmoid Function 将数据中心化为 0.5)

​ 之后只有 \(0 \le \hat{y} \le 1\) (即二元分类问题)才用 Sigmoid Function,因为\(tanh\)几乎严格优于Sigmoid...

②Rectified Linear Unit(线性整流函数, ReLU):\(Q = max(0,z)\)

​ When not sure what to use for your hidden layer, can use the ReLU function

​ Disadvantage of ReLU: when \(z\) is negative, the value is 0.

​ It can use what names Leaky ReLU to overcome the disadvantage below.

​ Leaky ReLU: \(a = max(0.01z, z)\)

​ ReLU可以使得斜率不变(Sigmoid 和 \(tanh(z)\) 在\(z\rightarrow \infin\)时斜率趋向于0,会使得学习速度下降)

​ 最常用的 Activation Function

③Tannish Function(双曲函数)

当且仅当要解决回归问题的时候,在生成到output layer才使用线性的Activation Function(\(g(z)=z\)) ,比如预测房价时,y不限于 0 和 1(\(y \in \mathbb{R}\)),所以可以用\(g(z)=z\) 输出,隐藏单元不应该使用Linear Activation Function, 而是应该使用tanh/ReLU/Leaky ReLU

Derivatives of Activation Functions

  • Sigmoid:

    • \(\frac{{\rm d}}{{\rm d}z}g(z) = g(z)(1-g(z))\)

      \(tanh(z)\):
    • \(g\prime(z) = 1-(tanh(z))^2\)
  • ReLU:
    • \(g\prime(z) = \begin{cases}1, \text{if }z\ge0 \\0, \text{if }z\lt0 \end{cases}\)

Gradient Descents For Neural Networks

Parameters : \(w^{[1]},b^{[1]},w^{[2]},b^{[2]}\)

Cost Function : \(J(w^{[1]},b^{[1]},w^{[2]},b^{[2]})= \frac{1}{m}\sum_{i=1}^m \mathcal{L}(\hat{y},y)\)

Gradient Function:

\[\begin{align}
&\text{Repeat \{}\\
&\quad \text{compute predicts} (\hat{y}^{(i)}, i = 1,\dots,m) \\
&\quad {\rm d}w^{[1]} = \frac{\partial J}{\partial w^{[1]}}, {\rm d}b^{[1]} = \frac{\partial J}{\partial b^{[1]}},\dots\\
&\quad w^{[1]} = w^{[1]} - \alpha {\rm d}w^{[1]}\\
&\quad b^{[1]} = b^{[1]} - \alpha {\rm d}b^{[1]}\\
&\quad w^{[2]} = w^{[2]} - \alpha {\rm d}w^{[2]}\\
&\quad b^{[2]} = b^{[2]} - \alpha {\rm d}b^{[2]}\\
\text{\}}
\end{align}
\]

Forward Propagation :

\[\begin{align}
Z^{[1]} &= w^{[1]}X + b^{[1]}\\
A^{[1]} &= g^{[1]}(z^{[1]})\\
Z^{[2]} &= w^{[2]}A^{[1]} + b^{[2]}\\
A^{[2]} &= g^{[2]}(z^{[2]}) = \sigma(Z^{[2]})
\end{align}
\]

Backward Propagation :

\[\begin{align}
{\rm d}Z^{[2]} &= A^{[2]} - Y, \quad Y = \begin{bmatrix}y^{[1]} & y^{[2]} & \dots & y^{[m]}\end{bmatrix}\\
{\rm d}w^{[2]} &= \frac{1}{m} {\rm d}z^{[2]} A^{[1]T}\\
{\rm d}d^{[2]} &= \frac{1}{m}\text{np.sum(d}z^{[2]}\text{,axis=1,keepdims=True)}\\
{\rm d}Z^{[1]} &= w^{[2]T}{\rm d}Z^{[2]}\; .* \; g^{[1]\prime}(Z^{[1]})\\
{\rm d}w^{[1]} &= \frac{1}{m} {\rm d}Z^{[1]}X^T\\
{\rm d}d^{[1]} &= \frac{1}{m}\text{np.sum(d}z^{[1]}\text{,axis=1,keepdims=True)}\\
\end{align}
\]

注:axis = 1 means summing horizontally, and keepdims = True means prevent from outputting Rank 1 Array. You can call reshape function explicitly rather than keeping these parameters.

又注:\(由于A^{[1]} = g^{[1]}(Z^{[1]})且g^{[1]\prime}(z) = 1-a^2,\;所以 g^{[1]\prime}(Z^{[1]}) = 1-(A^{[1]})^2\), 即:\(Z^{[1]} = w^{[2]T}{\rm d}Z^{[2]}\; .* \; (1-(A^{[1]})^2\)

Random Initialization

For a neural network, if initialize the weights to parameters to all zero and then apply gradient descent, it won't work.

Deep Learning--week1~week3的更多相关文章

  1. Coursera, Deep Learning 1, Neural Networks and Deep Learning - week1, Introduction to deep learning

    整个deep learing 系列课程主要包括哪些内容 Intro to Deep learning

  2. Neural Networks and Deep Learning(week3)Planar data classification with one hidden layer(基于单隐藏层神经网络的平面数据分类)

    Planar data classification with one hidden layer 你会学习到如何: 用单隐层实现一个二分类神经网络 使用一个非线性激励函数,如 tanh 计算交叉熵的损 ...

  3. 【DeepLearning学习笔记】Coursera课程《Neural Networks and Deep Learning》——Week1 Introduction to deep learning课堂笔记

    Coursera课程<Neural Networks and Deep Learning> deeplearning.ai Week1 Introduction to deep learn ...

  4. Coursera, Deep Learning 4, Convolutional Neural Networks - week1

    CNN 主要解决 computer vision 问题,同时解决input X 维度太大的问题. Edge detection 下面演示了convolution 的概念 下图的 vertical ed ...

  5. Coursera Deep Learning 2 Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization - week1, Assignment(Gradient Checking)

    声明:所有内容来自coursera,作为个人学习笔记记录在这里. Gradient Checking Welcome to the final assignment for this week! In ...

  6. Coursera Deep Learning 2 Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization - week1, Assignment(Regularization)

    声明:所有内容来自coursera,作为个人学习笔记记录在这里. Regularization Welcome to the second assignment of this week. Deep ...

  7. Deep learning:五十一(CNN的反向求导及练习)

    前言: CNN作为DL中最成功的模型之一,有必要对其更进一步研究它.虽然在前面的博文Stacked CNN简单介绍中有大概介绍过CNN的使用,不过那是有个前提的:CNN中的参数必须已提前学习好.而本文 ...

  8. 【深度学习Deep Learning】资料大全

    最近在学深度学习相关的东西,在网上搜集到了一些不错的资料,现在汇总一下: Free Online Books  by Yoshua Bengio, Ian Goodfellow and Aaron C ...

  9. 《Neural Network and Deep Learning》_chapter4

    <Neural Network and Deep Learning>_chapter4: A visual proof that neural nets can compute any f ...

  10. Deep Learning模型之:CNN卷积神经网络(一)深度解析CNN

    http://m.blog.csdn.net/blog/wu010555688/24487301 本文整理了网上几位大牛的博客,详细地讲解了CNN的基础结构与核心思想,欢迎交流. [1]Deep le ...

随机推荐

  1. VUE-003-前端表格数据展示时,设置单元格(el-table-column)保留空格和换行

    在使用 el-table 展示数据时,单元格中的数据有可能存在空格和换行符,若不进行设置,浏览器默认会取消空格和换行符,如下所示: 解决方法: 将单元格的样式 “white-space” 属性设置为“ ...

  2. C#进度条简单应用

    进度条表示文件复制的进度: 1.将进度条最大值设置为需要复制的文件总数 2.遍历文件时每复制一个文件之后,进度条+1 ;//文件总数 progressBar1.Value = progressBar1 ...

  3. webpack(7)-生产环境

    development(开发环境) 和 production(生产环境) 这两个环境下的构建目标存在着巨大差异.在开发环境中,我们需要:强大的 source map 和一个有着 live reload ...

  4. [LintCode] 77. Longest common subsequences_ Medium tag: Dynamic Programming

    Given two strings, find the longest common subsequence (LCS). Example Example 1: Input: "ABCD&q ...

  5. Cocos Creator 动态改变sprite图片

    首先将存放图片最外层文件夹命名为resources changeBj: function(){ var url = 'globalUI/video/gVideoPlayClick'; var _thi ...

  6. Python3.0科学计算学习之绘图(三)

    matplotlib对象: 使用matplotlib的pyplot模块,可以供用户直接使用最重要的绘图命令.多数情况下,我们希望创建一个图形并且立即展示出来,但是有时如果生成要通过更改其属性来修改的图 ...

  7. EntityFramework如何创建索引?

    一.首先创建一个类 FwEntityTypeConfiguration 继承 EntityTypeConfiguration ,该类完整代码如下: using System.Data.Entity.M ...

  8. 用html5实现音频播放器

    <!DOCTYPE html> <html> <head> <meta charset="utf-8" /> <title&g ...

  9. 【数据结构】算法 LinkList (Remove Nth Node From End of List)

    删除链表中倒数第n个节点 时间复杂度要控制在O(n)Solution:设置2个指针,一个用于确定删除节点的位置,一个用于计算倒数间距n.移动时保持2个指针同时移动. public ListNode r ...

  10. ajax的网上解析

    /* 用XMLHTTPRequest来进行ajax异步数据交交互*/ 主要有几个步骤: //1.创建XMLHTTPRequest对象 //最复杂的一步 if (window.XMLHttpReques ...