1. numpy中的几种矩阵相乘:


# x1: axn, x2:nxb np.dot(x1, x2): axn * nxb np.outer(x1, x2): nx1*1xn # 实质为: np.ravel(x1)*np.ravel(x2) np.multiply(x1, x2): [[x1[0][0]*x2[0][0], x1[0][1]*x2[0][1], ...]

2. Bugs' hometown

Many software bugs in deep learning come from having matrix/vector dimensions that don't fit. If you can keep your matrix/vector dimensions straight you will go a long way toward eliminating many bugs.

3. Common steps for pre-processing a new dataset are:

- Figure out the dimensions and shapes of the problem (m_train, m_test, num_px, ...)
- Reshape the datasets such that each example is now a vector of size (num_px \* num_px \* 3, 1)
- "Standardize" the data

4. Unstructured data:

Unstructured data is a generic label for describing data that is not contained in a database or some other type of data structure . Unstructured data can be textual or non-textual. Textual unstructured data is generated in media like email messages, PowerPoint presentations, Word documents, collaboration software and instant messages. Non-textual unstructured data is generated in media like JPEG images, MP3 audio files and Flash video files

5. Chapter of Activation Function:

  • Choice of activation function:

    1. If output is either 0 or 1 -- sigmoid for the output layer and the other units on ReLU.
    2. Except for the output layer, tanh does better than sigmoid.
    3. ReLU ---level up--> leaky ReLU.
  • Why are ReLU and leaky ReLU often superior to sigmoid and tanh?

    -- The derivatives of the former ones is much bigger than 0, so the learning would be much faster.

  • A linear hidden layer is more or less useless, yet the activation function is a exception.

6. Regularization:

Initially, \(J(w, b) = \frac{1}{m} * \sum_{i=1}^{m}{L({\hat{Y}^(i), y^{(i)}}) + \frac{\lambda}{2*m}||w||_2^2}\)

  1. L2 regularization: \(\frac{\lambda}{2*m}\sum_{j=1}^{n_x}||w_j||^2 = \frac{\lambda}{2*m}||w||_2\)

    One aspect that tanh is better thatn sigmoid(in terms of regularization) -- When x is very close to 0, the derivative of tanh(x) is almost linear, while that of the sigmoid(x) is alomst 0.

  2. Dropout:

     Method: Make certain values of weights be zeros randomly, just like -- W= np.multiply(W, C), where C is a 0-1 array.

     Matters need attention: Don't use dropout in test procedure -- Time costly, result randomly.

     Work principle:

       Intuition: Can't rely on any one feature, so have to spread out weights(shrinking weights).

       Besides, you can set different rates of "Dropout", like lower ones on more complex layer, which are called "key prop".

  1. Data augmentation:

     Do some operation on your data images, such as flipping, rotation, zooming, etc, without changing their labels, in order to prevent from over-fitting on some aspects, such as the direction of faces, the size of cats.

  1. Early stopping.

7. Solution to "gradient vanishing or exploding":

     Set WL = np.random.randn(shape) * np.sqrt(\(\frac{2}{n^{[L-1]}}\)) if activation_function == "ReLU"

       else: np.random.randn(shape) * np.sqrt(\(\frac{1}{n^{[L-1]}}\)) or np.sqrt(\(\sqrt{\frac{2}{n^{[L-1]}+n^{[L]}}}\))(Xavier initialization)

8. Gradient Checking:

  1. for i in range(len(\(\theta\))):

    to check if (d\(\theta_{approx}[i] = \frac{J(\theta_1, \theta_2, ..., \theta_i+\epsilon, ...) - J(\theta_1, \theta_2, ..., \theta_i-\epsilon, ...)}{2\epsilon}\)) ?= \(d\theta[i] = \frac{\partial{J}}{\partial{\theta_i}}\)

             <==> \(d\theta_{approx} ?= d\theta\)

             <==> \(\frac{||d\theta_{approx} - d\theta||_2}{||d\theta_{approx}||_2+||d\theta||_2}\) in an accent range: \(10^{-7}\) is great, and \(10^{-3}\) is wrong.

  2. Tips:

    • Only to debug, instead of training.
    • If algorithm fails grad check, look at components(\(db^{[L]}, dw^{[L]}\)) to try to identify bug.
    • Remember regularization.
    • Doesn't work together with dropout.
    • Run at random initialization; perhaps again after some training.

9. Exponentially weighted averages:

  Definition: let \(V_{t} = {\beta}V_{t-1} + (1 - \beta)\theta_t\) (_V_s are the averages, and the _\(\theta\)_s are the initial discrete data).

and \(V_{t} = \frac{V_{t}}{1 - {\beta}^t}\) (To correct initial bias).

  Usage: when it comes to this situation:

Since the average of the distance vertical movement is almost zeros, you can use EWA to average it, prevent it from divergence.

  On iteration t:

    Compute dW on the current mini-batch

    \(v_{dW} = {\beta}v_{dW} + (1 - \beta)dW\)

    \(v_{db} = {\beta}v_{db} + (1 - \beta)db\)

    \(W = W - {\alpha}v_{dW}, b = b - {\alpha}v_{db}\)

    Hyperparameters: \(\alpha\), \({\beta}(=0.9)\)

Deep Learning Specialization 笔记的更多相关文章

  1. Deep Learning论文笔记之(四)CNN卷积神经网络推导和实现(转)

    Deep Learning论文笔记之(四)CNN卷积神经网络推导和实现 zouxy09@qq.com http://blog.csdn.net/zouxy09          自己平时看了一些论文, ...

  2. Deep Learning论文笔记之(八)Deep Learning最新综述

    Deep Learning论文笔记之(八)Deep Learning最新综述 zouxy09@qq.com http://blog.csdn.net/zouxy09 自己平时看了一些论文,但老感觉看完 ...

  3. Deep Learning论文笔记之(六)Multi-Stage多级架构分析

    Deep Learning论文笔记之(六)Multi-Stage多级架构分析 zouxy09@qq.com http://blog.csdn.net/zouxy09          自己平时看了一些 ...

  4. 【deep learning学习笔记】注释yusugomori的DA代码 --- dA.h

    DA就是“Denoising Autoencoders”的缩写.继续给yusugomori做注释,边注释边学习.看了一些DA的材料,基本上都在前面“转载”了.学习中间总有个疑问:DA和RBM到底啥区别 ...

  5. Deep Learning论文笔记之(一)K-means特征学习

    Deep Learning论文笔记之(一)K-means特征学习 zouxy09@qq.com http://blog.csdn.net/zouxy09          自己平时看了一些论文,但老感 ...

  6. Deep Learning论文笔记之(三)单层非监督学习网络分析

    Deep Learning论文笔记之(三)单层非监督学习网络分析 zouxy09@qq.com http://blog.csdn.net/zouxy09          自己平时看了一些论文,但老感 ...

  7. Spectral Norm Regularization for Improving the Generalizability of Deep Learning论文笔记

    Spectral Norm Regularization for Improving the Generalizability of Deep Learning论文笔记 2018年12月03日 00: ...

  8. Deep Learning论文笔记之(四)CNN卷积神经网络推导和实现

    https://blog.csdn.net/zouxy09/article/details/9993371 自己平时看了一些论文,但老感觉看完过后就会慢慢的淡忘,某一天重新拾起来的时候又好像没有看过一 ...

  9. [置顶] Deep Learning 学习笔记

    一.文章来由 好久没写原创博客了,一直处于学习新知识的阶段.来新加坡也有一个星期,搞定签证.入学等杂事之后,今天上午与导师确定了接下来的研究任务,我平时基本也是把博客当作联机版的云笔记~~如果有写的不 ...

随机推荐

  1. CMU数据库(15-445)实验2-B+树索引实现(下+课上笔记)

    4. Index_Iterator实现 这里就是需要实现迭代器的一些操作,比如begin.end.isend等等 下面是对于IndexIterator的构造函数 template <typena ...

  2. 简话http请求

    一.http请求概念: 1.是指从客户端到服务器端的请求消息.包括:消息首行中,对资源的请求方法.资源的标识符及使用的协议. 包括请求(request)和响应(response) 2.过程: 域名解析 ...

  3. Ubuntu16 安装 OpenSSH-Server

    Ubuntu16.04 桌面版默认是没有安装 SSH 服务的,需要手动安装服务: 更新源:sudo apt-get update 安装服务:sudo apt-get install -y openss ...

  4. Linux监控内核SNMP计数器

    nstat命令和rtacct命令是一个简单的监视内核的SNMP计数器和网络接口状态的实用工具. 语法 nstat/rtacct (选项) 选项 -h:显示帮助信息: -V:显示指令版本信息: -z:显 ...

  5. 《进击吧!Blazor!》第一章 3.页面制作

    作者介绍 陈超超Ant Design Blazor 项目贡献者拥有十多年从业经验,长期基于.Net技术栈进行架构与开发产品的工作,Ant Design Blazor 项目贡献者,现就职于正泰集团 写专 ...

  6. 小w、小j和小z

    n个月没更了,现在学的东西很难,掌握不好,不敢更! 这个题目既不超范围又足够难想,反正我没想出来,很好的题目! 我发现noi.ac上的题目很不错!!! ------------------------ ...

  7. Python3爬取猫眼电影信息

    Python3爬取猫眼电影信息 import json import requests from requests.exceptions import RequestException import ...

  8. hadoop知识点总结(二)hdfs分布式文件系统

    1, hdfs设计:减少硬件错误的危害,流式数据访问,大规模数据集,简单的一致性模型 2,特点: 1)移动计算的代价比移动数据的代价低 在异构的软硬件平台间的可移植性 2)局限性 不适合低延迟性数据访 ...

  9. linux系统rpm和yum软件包管理

    软件安装方式总结 安装软件方式有如下几种: 方式1:编译安装 将源码程序按照需求进行先编译,后安装 缺点:装过程复杂,而且很慢 优点:安装过程可控,真正的按需求进行安装(安装位置.安装的模块都可以选择 ...

  10. 使用Selenium截取网页上的图片

    前言 同样是为了刷课,没想到工作后依然和大学一样逃脱不了需要刷网课的命运-- 正文 直接说干货了,截取图片,需要截取的图片是什么图片大家都懂(说的就是你,验证码),其他图片的话不需要截取,直接拿到地址 ...