Deep Learning Specialization 笔记
1. numpy中的几种矩阵相乘:
# x1: axn, x2:nxb
np.dot(x1, x2): axn * nxb
np.outer(x1, x2): nx1*1xn # 实质为: np.ravel(x1)*np.ravel(x2)
np.multiply(x1, x2): [[x1[0][0]*x2[0][0], x1[0][1]*x2[0][1], ...]
2. Bugs' hometown
Many software bugs in deep learning come from having matrix/vector dimensions that don't fit. If you can keep your matrix/vector dimensions straight you will go a long way toward eliminating many bugs.
3. Common steps for pre-processing a new dataset are:
- Figure out the dimensions and shapes of the problem (m_train, m_test, num_px, ...)
- Reshape the datasets such that each example is now a vector of size (num_px \* num_px \* 3, 1)
- "Standardize" the data
4. Unstructured data:
Unstructured data is a generic label for describing data that is not contained in a database or some other type of data structure . Unstructured data can be textual or non-textual. Textual unstructured data is generated in media like email messages, PowerPoint presentations, Word documents, collaboration software and instant messages. Non-textual unstructured data is generated in media like JPEG images, MP3 audio files and Flash video files
5. Chapter of Activation Function:
Choice of activation function:
- If output is either 0 or 1 -- sigmoid for the output layer and the other units on ReLU.
- Except for the output layer, tanh does better than sigmoid.
- ReLU ---level up--> leaky ReLU.
Why are ReLU and leaky ReLU often superior to sigmoid and tanh?
-- The derivatives of the former ones is much bigger than 0, so the learning would be much faster.
A linear hidden layer is more or less useless, yet the activation function is a exception.
6. Regularization:
Initially, \(J(w, b) = \frac{1}{m} * \sum_{i=1}^{m}{L({\hat{Y}^(i), y^{(i)}}) + \frac{\lambda}{2*m}||w||_2^2}\)
L2 regularization: \(\frac{\lambda}{2*m}\sum_{j=1}^{n_x}||w_j||^2 = \frac{\lambda}{2*m}||w||_2\)
One aspect that tanh is better thatn sigmoid(in terms of regularization) -- When x is very close to 0, the derivative of tanh(x) is almost linear, while that of the sigmoid(x) is alomst 0.
Dropout:
Method: Make certain values of weights be zeros randomly, just like -- W= np.multiply(W, C), where C is a 0-1 array.
Matters need attention: Don't use dropout in test procedure -- Time costly, result randomly.
Work principle:
Intuition: Can't rely on any one feature, so have to spread out weights(shrinking weights).
Besides, you can set different rates of "Dropout", like lower ones on more complex layer, which are called "key prop".
- Data augmentation:
Do some operation on your data images, such as flipping, rotation, zooming, etc, without changing their labels, in order to prevent from over-fitting on some aspects, such as the direction of faces, the size of cats.
- Early stopping.
7. Solution to "gradient vanishing or exploding":
Set WL = np.random.randn(shape) * np.sqrt(\(\frac{2}{n^{[L-1]}}\)) if activation_function == "ReLU"
else: np.random.randn(shape) * np.sqrt(\(\frac{1}{n^{[L-1]}}\)) or np.sqrt(\(\sqrt{\frac{2}{n^{[L-1]}+n^{[L]}}}\))(Xavier initialization)
8. Gradient Checking:
for i in range(len(\(\theta\))):
to check if (d\(\theta_{approx}[i] = \frac{J(\theta_1, \theta_2, ..., \theta_i+\epsilon, ...) - J(\theta_1, \theta_2, ..., \theta_i-\epsilon, ...)}{2\epsilon}\)) ?= \(d\theta[i] = \frac{\partial{J}}{\partial{\theta_i}}\)
<==> \(d\theta_{approx} ?= d\theta\)
<==> \(\frac{||d\theta_{approx} - d\theta||_2}{||d\theta_{approx}||_2+||d\theta||_2}\) in an accent range: \(10^{-7}\) is great, and \(10^{-3}\) is wrong.
Tips:
- Only to debug, instead of training.
- If algorithm fails grad check, look at components(\(db^{[L]}, dw^{[L]}\)) to try to identify bug.
- Remember regularization.
- Doesn't work together with dropout.
- Run at random initialization; perhaps again after some training.
9. Exponentially weighted averages:
Definition: let \(V_{t} = {\beta}V_{t-1} + (1 - \beta)\theta_t\) (_V_s are the averages, and the _\(\theta\)_s are the initial discrete data).
and \(V_{t} = \frac{V_{t}}{1 - {\beta}^t}\) (To correct initial bias).
Usage: when it comes to this situation:
Since the average of the distance vertical movement is almost zeros, you can use EWA to average it, prevent it from divergence.
On iteration t:
Compute dW on the current mini-batch
\(v_{dW} = {\beta}v_{dW} + (1 - \beta)dW\)
\(v_{db} = {\beta}v_{db} + (1 - \beta)db\)
\(W = W - {\alpha}v_{dW}, b = b - {\alpha}v_{db}\)
Hyperparameters: \(\alpha\), \({\beta}(=0.9)\)
Deep Learning Specialization 笔记的更多相关文章
- Deep Learning论文笔记之(四)CNN卷积神经网络推导和实现(转)
Deep Learning论文笔记之(四)CNN卷积神经网络推导和实现 zouxy09@qq.com http://blog.csdn.net/zouxy09 自己平时看了一些论文, ...
- Deep Learning论文笔记之(八)Deep Learning最新综述
Deep Learning论文笔记之(八)Deep Learning最新综述 zouxy09@qq.com http://blog.csdn.net/zouxy09 自己平时看了一些论文,但老感觉看完 ...
- Deep Learning论文笔记之(六)Multi-Stage多级架构分析
Deep Learning论文笔记之(六)Multi-Stage多级架构分析 zouxy09@qq.com http://blog.csdn.net/zouxy09 自己平时看了一些 ...
- 【deep learning学习笔记】注释yusugomori的DA代码 --- dA.h
DA就是“Denoising Autoencoders”的缩写.继续给yusugomori做注释,边注释边学习.看了一些DA的材料,基本上都在前面“转载”了.学习中间总有个疑问:DA和RBM到底啥区别 ...
- Deep Learning论文笔记之(一)K-means特征学习
Deep Learning论文笔记之(一)K-means特征学习 zouxy09@qq.com http://blog.csdn.net/zouxy09 自己平时看了一些论文,但老感 ...
- Deep Learning论文笔记之(三)单层非监督学习网络分析
Deep Learning论文笔记之(三)单层非监督学习网络分析 zouxy09@qq.com http://blog.csdn.net/zouxy09 自己平时看了一些论文,但老感 ...
- Spectral Norm Regularization for Improving the Generalizability of Deep Learning论文笔记
Spectral Norm Regularization for Improving the Generalizability of Deep Learning论文笔记 2018年12月03日 00: ...
- Deep Learning论文笔记之(四)CNN卷积神经网络推导和实现
https://blog.csdn.net/zouxy09/article/details/9993371 自己平时看了一些论文,但老感觉看完过后就会慢慢的淡忘,某一天重新拾起来的时候又好像没有看过一 ...
- [置顶]
Deep Learning 学习笔记
一.文章来由 好久没写原创博客了,一直处于学习新知识的阶段.来新加坡也有一个星期,搞定签证.入学等杂事之后,今天上午与导师确定了接下来的研究任务,我平时基本也是把博客当作联机版的云笔记~~如果有写的不 ...
随机推荐
- 路由协议-RIP协议
一.路由协议的发展历程和分类 距离矢量路由协议--听信"谣言",使用跳数作为度量值,最大16(0-15)跳:RIP 链路状态路由协议--"地图"路由协议:OSP ...
- 通俗易懂的解释:什么是API
API 全称 Application Programming Interface,即应用程序编程接口. 看到这里,急性子的小白同学马上就憋不住了:这不管是英文还是中文我每个字都懂啊,只是凑一块就不知道 ...
- your service shouldn’t know anything about HTTP headers, or gRPC error codes 干净架构 服务不应知道 HTTP头、gRPC错误码 服务仅知道服务相关的
Go kit - Frequently asked questions https://gokit.io/faq/ Services - What is a Go kit service? Servi ...
- C++ Primer Plus读书笔记(九)内存模型和名称空间
1.作用域和链接 int num3; static int num4; int main() { } void func1() { static int num1; int num2; } 上边的代码 ...
- 作为一款内存数据库,为什么断电后Redis数据不会丢失
前言 Redis 作为一款内存数据库,被广泛使用于缓存,分布式锁等场景,那么假如断电或者因其他因素导致 Reids 服务宕机,在重启之后数据会丢失吗? Redis 持久化机制 Redis 虽然是定义为 ...
- BIO,NIO,AIO 总结
BIO,NIO,AIO 总结 Java 中的 BIO.NIO和 AIO 理解为是 Java 语言对操作系统的各种 IO 模型的封装.程序员在使用这些 API 的时候,不需要关心操作系统层面的知识,也不 ...
- 关闭(隐藏)VS2019控制台上文件路径的显示
昨天有个朋友问我,怎么关闭在运行程序后,控制台上显示的文件路径啊?啥??我突然不知道他说的说什么,然后我就自己随便打了几行运行了一下,才知道原来他说的是这个: 一开始我也没在意,我就告诉他,这个无所谓 ...
- Shiro中Subject对象的创建与绑定流程分析
我们在平常使用Shrio进行身份认证时,经常通过获取Subject 对象中保存的Session.Principal等信息,来获取认证用户的信息,也就是说Shiro会把认证后的用户信息保存在Subjec ...
- 关于MongoDB的简单理解(三)--Spring Boot篇
一.前言 Spring Boot集成MongoDB非常简单,主要为加依赖,加配置,编码. 二.说明 环境说明: JDK版本为15(1.8+即可) Spring Boot 2.4.1 三.集成步骤 3. ...
- 004_C++常见错误类型总结(一)之最后几行错误
1.介绍 经常进行代码测试和静态代码分析的同学,应该会遇到这样的一个问题,就是一个程序段的最后几行,或者一个源文件末尾会出现错误.本文,结合专业的静态代码分析软件PSV-Studio提供错误类型代码库 ...