『cs231n』作业3问题2选讲_通过代码理解LSTM网络
LSTM神经元行为分析
LSTM 公式可以描述如下:

感觉比较新奇的一点是通过点乘矩阵使用‘门’控制数据流的取舍,和卷积神经网络的激活过程有一点点相似。
反向传播时,通过链式法则一个变量一个变量后推比较清晰。

反向传播时注意Ct节点,它既是本层的输出,也是本层另一个输出ht的输入节点,即它的梯度由两部分组成——上层回传梯度&ht反向传播梯度
向前传播
单个LSTM神经元向前传播
def lstm_step_forward(x, prev_h, prev_c, Wx, Wh, b):
"""
Forward pass for a single timestep of an LSTM. The input data has dimension D, the hidden state has dimension H, and we use
a minibatch size of N. Inputs:
- x: Input data, of shape (N, D)
- prev_h: Previous hidden state, of shape (N, H)
- prev_c: previous cell state, of shape (N, H)
- Wx: Input-to-hidden weights, of shape (D, 4H)
- Wh: Hidden-to-hidden weights, of shape (H, 4H)
- b: Biases, of shape (4H,) Returns a tuple of:
- next_h: Next hidden state, of shape (N, H)
- next_c: Next cell state, of shape (N, H)
- cache: Tuple of values needed for backward pass.
"""
next_h, next_c, cache = None, None, None
#############################################################################
# TODO: Implement the forward pass for a single timestep of an LSTM. #
# You may want to use the numerically stable sigmoid implementation above. #
#############################################################################
_, H = prev_h.shape
a = x.dot(Wx) + prev_h.dot(Wh) + b
i,f,o,g = sigmoid(a[:,:H]),sigmoid(a[:,H:2*H]),sigmoid(a[:,2*H:3*H]),np.tanh(a[:,3*H:])
next_c = f*prev_c + i*g
next_h = o*np.tanh(next_c)
cache = [i, f, o, g, x, prev_h, prev_c, Wx, Wh, b, next_c] return next_h, next_c, cache
层LSTM神经元向前传播
def lstm_forward(x, h0, Wx, Wh, b):
"""
Forward pass for an LSTM over an entire sequence of data. We assume an input
sequence composed of T vectors, each of dimension D. The LSTM uses a hidden
size of H, and we work over a minibatch containing N sequences. After running
the LSTM forward, we return the hidden states for all timesteps. Note that the initial cell state is passed as input, but the initial cell
state is set to zero. Also note that the cell state is not returned; it is
an internal variable to the LSTM and is not accessed from outside. Inputs:
- x: Input data of shape (N, T, D)
- h0: Initial hidden state of shape (N, H)
- Wx: Weights for input-to-hidden connections, of shape (D, 4H)
- Wh: Weights for hidden-to-hidden connections, of shape (H, 4H)
- b: Biases of shape (4H,) Returns a tuple of:
- h: Hidden states for all timesteps of all sequences, of shape (N, T, H)
- cache: Values needed for the backward pass.
"""
h, cache = None, None
#############################################################################
# TODO: Implement the forward pass for an LSTM over an entire timeseries. #
# You should use the lstm_step_forward function that you just defined. #
#############################################################################
N,T,D = x.shape
next_c = np.zeros_like(h0)
next_h = h0
h, cache = [], []
for i in range(T):
next_h, next_c, cache_step = lstm_step_forward(x[:,i,:], next_h, next_c, Wx, Wh, b)
h.append(next_h)
cache.append(cache_step)
h = np.array(h).transpose(1,0,2) #<-----------注意分析h存储后的维度是(T,N,H),需要转置为(N,T,H) return h, cache
反向传播
注意实际反向传播时,初始的C梯度是自己初始化的,而h梯度继承自高层(分类或者h到词袋的转化层,h层和RNN实际相同)
单个LSTM神经元反向传播
def lstm_step_backward(dnext_h, dnext_c, cache):
"""
Backward pass for a single timestep of an LSTM. Inputs:
- dnext_h: Gradients of next hidden state, of shape (N, H)
- dnext_c: Gradients of next cell state, of shape (N, H)
- cache: Values from the forward pass Returns a tuple of:
- dx: Gradient of input data, of shape (N, D)
- dprev_h: Gradient of previous hidden state, of shape (N, H)
- dprev_c: Gradient of previous cell state, of shape (N, H)
- dWx: Gradient of input-to-hidden weights, of shape (D, 4H)
- dWh: Gradient of hidden-to-hidden weights, of shape (H, 4H)
- db: Gradient of biases, of shape (4H,)
"""
dx, dprev_h, dprev_c, dWx, dWh, db = None, None, None, None, None, None
#############################################################################
# TODO: Implement the backward pass for a single timestep of an LSTM. #
# #
# HINT: For sigmoid and tanh you can compute local derivatives in terms of #
# the output value from the nonlinearity. #
#############################################################################
i, f, o, g, x, prev_h, prev_c, Wx, Wh, b, next_c = cache do = dnext_h*np.tanh(next_c)
dnext_c += dnext_h*o*(1-np.tanh(next_c)**2) #<-----------上面分析行为有提到这里的求法 di, df, dg, dprev_c = (g, prev_c, i, f) * dnext_c
da = np.concatenate([i*(1-i)*di, f*(1-f)*df, o*(1-o)*do, (1-g**2)*dg],axis=1) db = np.sum(da,axis=0)
dx, dWx, dprev_h, dWh = (da.dot(Wx.T), x.T.dot(da), da.dot(Wh.T), prev_h.T.dot(da)) return dx, dprev_h, dprev_c, dWx, dWh, db
层LSTM神经元反向传播
def lstm_backward(dh, cache):
"""
Backward pass for an LSTM over an entire sequence of data.] Inputs:
- dh: Upstream gradients of hidden states, of shape (N, T, H)
- cache: Values from the forward pass Returns a tuple of:
- dx: Gradient of input data of shape (N, T, D)
- dh0: Gradient of initial hidden state of shape (N, H)
- dWx: Gradient of input-to-hidden weight matrix of shape (D, 4H)
- dWh: Gradient of hidden-to-hidden weight matrix of shape (H, 4H)
- db: Gradient of biases, of shape (4H,)
"""
dx, dh0, dWx, dWh, db = None, None, None, None, None
#############################################################################
# TODO: Implement the backward pass for an LSTM over an entire timeseries. #
# You should use the lstm_step_backward function that you just defined. #
#############################################################################
N,T,H = dh.shape
_, D = cache[0][4].shape
dx, dh0, dWx, dWh, db = \
[], np.zeros((N, H), dtype='float32'), \
np.zeros((D, 4*H), dtype='float32'), np.zeros((H, 4*H), dtype='float32'), np.zeros(4*H, dtype='float32') step_dprev_h, step_dprev_c = np.zeros((N,H)),np.zeros((N,H))
for i in xrange(T-1, -1, -1):
step_dx, step_dprev_h, step_dprev_c, step_dWx, step_dWh, step_db = \
lstm_step_backward(dh[:,i,:] + step_dprev_h, step_dprev_c, cache[i])
dx.append(step_dx) # 每一个输入节点都有自己的梯度
dWx += step_dWx # 层共享参数,需要累加和
dWh += step_dWh # 层共享参数,需要累加和
db += step_db # 层共享参数,需要累加和
dh0 = step_dprev_h # 只有最初输入的h0,即feature的投影(图像标注中),需要存储梯度
dx = np.array(dx[::-1]).transpose((1,0,2)) return dx, dh0, dWx, dWh, db
『cs231n』作业3问题2选讲_通过代码理解LSTM网络的更多相关文章
- 『cs231n』作业3问题1选讲_通过代码理解RNN&图像标注训练
一份不错的作业3资料(含答案) RNN神经元理解 单个RNN神经元行为 括号中表示的是维度 向前传播 def rnn_step_forward(x, prev_h, Wx, Wh, b): " ...
- 『cs231n』作业3问题3选讲_通过代码理解图像梯度
Saliency Maps 这部分想探究一下 CNN 内部的原理,参考论文 Deep Inside Convolutional Networks: Visualising Image Classifi ...
- 『cs231n』作业3问题4选讲_图像梯度应用强化
[注],本节(上节也是)的model是一个已经训练完成的CNN分类网络. 随机数图片向前传播后对目标类优化,反向优化图片本体 def create_class_visualization(target ...
- 『cs231n』作业2选讲_通过代码理解Dropout
Dropout def dropout_forward(x, dropout_param): p, mode = dropout_param['p'], dropout_param['mode'] i ...
- 『cs231n』作业2选讲_通过代码理解优化器
1).Adagrad一种自适应学习率算法,实现代码如下: cache += dx**2 x += - learning_rate * dx / (np.sqrt(cache) + eps) 这种方法的 ...
- 『cs231n』作业1选讲_通过代码理解KNN&交叉验证&SVM
通过K近邻算法探究numpy向量运算提速 茴香豆的“茴”字有... ... 使用三种计算图片距离的方式实现K近邻算法: 1.最为基础的双循环 2.利用numpy的broadca机制实现单循环 3.利用 ...
- 『cs231n』通过代码理解风格迁移
『cs231n』卷积神经网络的可视化应用 文件目录 vgg16.py import os import numpy as np import tensorflow as tf from downloa ...
- 『cs231n』计算机视觉基础
线性分类器损失函数明细: 『cs231n』线性分类器损失函数 最优化Optimiz部分代码: 1.随机搜索 bestloss = float('inf') # 无穷大 for num in range ...
- 『TensorFlow』DCGAN生成动漫人物头像_下
『TensorFlow』以GAN为例的神经网络类范式 『cs231n』通过代码理解gan网络&tensorflow共享变量机制_上 『TensorFlow』通过代码理解gan网络_中 一.计算 ...
随机推荐
- Open-Falcon
A Distributed and High-Performance Monitoring System Scalability Scalable monitoring system is neces ...
- tomcat警告WARNING: An attempt was made to authenticate the locked user "user"
后台出现很多警告WARNING: An attempt was made to authenticate the locked user "user"Jul 19, 2017 2: ...
- 在uboot中加入cmd_run命令,运行环境变量
在学习uboot的过程中会经常烧录程序,每次都要敲一些下载指令.这样是不是很麻烦,有什么办法能快速的烧写呢.很简单,将需要敲击的指令编译到uboot中,以环境变量的形式存在.但是环境变量很好加,如何运 ...
- Eclipse启动Tomcat时,45秒超时解决方式
Eclipse启动Tomcat时,45秒超时解决方式 在Eclipse中启动Tomcat服务器时,经常由于系统初始化项目多,导致出现45秒超时的Tomcat服务器启动错误. 一般通过找到XML配置文 ...
- 20145327 《网络对抗技术》 Web基础
20145327 <网络对抗技术> Web基础 apache 输入apachectl start开启Apach 输入netstat -aptn查看端口占用:apach2占用端口80 测试a ...
- noip 2014 提高组 Day 2
1.无线网络发射器选址 这道题数据范围很小,就直接暴力枚举就好了.为了提高速度,就从每个有公共场所的点枚举周围在(x,y)放无线网路发射器可以增加的公共场所数量,加到一个数组里.所有公共场所都处理完了 ...
- C++ shared_ptr的用法
一. http://www.cnblogs.com/welkinwalker/archive/2011/10/20/2218804.html 二.http://www.cnblogs.com/Tian ...
- Python3基础 str """ 多行字符串
Python : 3.7.0 OS : Ubuntu 18.04.1 LTS IDE : PyCharm 2018.2.4 Conda ...
- MBR记录
mbr version: 1.6 boot code size: primary data size: extended data size: debug version: no bpb status ...
- BZOJ1407: [Noi2002]Savage exgcd
Description Input 第1行为一个整数N(1<=N<=15),即野人的数目. 第2行到第N+1每行为三个整数Ci, Pi, Li表示每个野人所住的初始洞穴编号,每年走过的洞穴 ...