LSTM神经元行为分析

LSTM 公式可以描述如下:

itftotgtctht=sigmoid(Wixxt+Wihht−1+bi)=sigmoid(Wfxxt+Wfhht−1+bf)=sigmoid(Woxxt+Wohht−1+bo)=tanh(Wgxxt+Wghht−1+bg)=ft∘ct−1+it∘gt=ot∘ct

感觉比较新奇的一点是通过点乘矩阵使用‘门’控制数据流的取舍,和卷积神经网络的激活过程有一点点相似。

反向传播时,通过链式法则一个变量一个变量后推比较清晰。

反向传播时注意Ct节点,它既是本层的输出,也是本层另一个输出ht的输入节点,即它的梯度由两部分组成——上层回传梯度&ht反向传播梯度

向前传播

单个LSTM神经元向前传播

def lstm_step_forward(x, prev_h, prev_c, Wx, Wh, b):
"""
Forward pass for a single timestep of an LSTM. The input data has dimension D, the hidden state has dimension H, and we use
a minibatch size of N. Inputs:
- x: Input data, of shape (N, D)
- prev_h: Previous hidden state, of shape (N, H)
- prev_c: previous cell state, of shape (N, H)
- Wx: Input-to-hidden weights, of shape (D, 4H)
- Wh: Hidden-to-hidden weights, of shape (H, 4H)
- b: Biases, of shape (4H,) Returns a tuple of:
- next_h: Next hidden state, of shape (N, H)
- next_c: Next cell state, of shape (N, H)
- cache: Tuple of values needed for backward pass.
"""
next_h, next_c, cache = None, None, None
#############################################################################
# TODO: Implement the forward pass for a single timestep of an LSTM. #
# You may want to use the numerically stable sigmoid implementation above. #
#############################################################################
_, H = prev_h.shape
a = x.dot(Wx) + prev_h.dot(Wh) + b
i,f,o,g = sigmoid(a[:,:H]),sigmoid(a[:,H:2*H]),sigmoid(a[:,2*H:3*H]),np.tanh(a[:,3*H:])
next_c = f*prev_c + i*g
next_h = o*np.tanh(next_c)
cache = [i, f, o, g, x, prev_h, prev_c, Wx, Wh, b, next_c] return next_h, next_c, cache

层LSTM神经元向前传播

def lstm_forward(x, h0, Wx, Wh, b):
"""
Forward pass for an LSTM over an entire sequence of data. We assume an input
sequence composed of T vectors, each of dimension D. The LSTM uses a hidden
size of H, and we work over a minibatch containing N sequences. After running
the LSTM forward, we return the hidden states for all timesteps. Note that the initial cell state is passed as input, but the initial cell
state is set to zero. Also note that the cell state is not returned; it is
an internal variable to the LSTM and is not accessed from outside. Inputs:
- x: Input data of shape (N, T, D)
- h0: Initial hidden state of shape (N, H)
- Wx: Weights for input-to-hidden connections, of shape (D, 4H)
- Wh: Weights for hidden-to-hidden connections, of shape (H, 4H)
- b: Biases of shape (4H,) Returns a tuple of:
- h: Hidden states for all timesteps of all sequences, of shape (N, T, H)
- cache: Values needed for the backward pass.
"""
h, cache = None, None
#############################################################################
# TODO: Implement the forward pass for an LSTM over an entire timeseries. #
# You should use the lstm_step_forward function that you just defined. #
#############################################################################
N,T,D = x.shape
next_c = np.zeros_like(h0)
next_h = h0
h, cache = [], []
for i in range(T):
next_h, next_c, cache_step = lstm_step_forward(x[:,i,:], next_h, next_c, Wx, Wh, b)
h.append(next_h)
cache.append(cache_step)
h = np.array(h).transpose(1,0,2) #<-----------注意分析h存储后的维度是(T,N,H),需要转置为(N,T,H) return h, cache

反向传播

注意实际反向传播时,初始的C梯度是自己初始化的,而h梯度继承自高层(分类或者h到词袋的转化层,h层和RNN实际相同)

单个LSTM神经元反向传播

def lstm_step_backward(dnext_h, dnext_c, cache):
"""
Backward pass for a single timestep of an LSTM. Inputs:
- dnext_h: Gradients of next hidden state, of shape (N, H)
- dnext_c: Gradients of next cell state, of shape (N, H)
- cache: Values from the forward pass Returns a tuple of:
- dx: Gradient of input data, of shape (N, D)
- dprev_h: Gradient of previous hidden state, of shape (N, H)
- dprev_c: Gradient of previous cell state, of shape (N, H)
- dWx: Gradient of input-to-hidden weights, of shape (D, 4H)
- dWh: Gradient of hidden-to-hidden weights, of shape (H, 4H)
- db: Gradient of biases, of shape (4H,)
"""
dx, dprev_h, dprev_c, dWx, dWh, db = None, None, None, None, None, None
#############################################################################
# TODO: Implement the backward pass for a single timestep of an LSTM. #
# #
# HINT: For sigmoid and tanh you can compute local derivatives in terms of #
# the output value from the nonlinearity. #
#############################################################################
i, f, o, g, x, prev_h, prev_c, Wx, Wh, b, next_c = cache do = dnext_h*np.tanh(next_c)
dnext_c += dnext_h*o*(1-np.tanh(next_c)**2) #<-----------上面分析行为有提到这里的求法 di, df, dg, dprev_c = (g, prev_c, i, f) * dnext_c
da = np.concatenate([i*(1-i)*di, f*(1-f)*df, o*(1-o)*do, (1-g**2)*dg],axis=1) db = np.sum(da,axis=0)
dx, dWx, dprev_h, dWh = (da.dot(Wx.T), x.T.dot(da), da.dot(Wh.T), prev_h.T.dot(da)) return dx, dprev_h, dprev_c, dWx, dWh, db

层LSTM神经元反向传播

def lstm_backward(dh, cache):
"""
Backward pass for an LSTM over an entire sequence of data.] Inputs:
- dh: Upstream gradients of hidden states, of shape (N, T, H)
- cache: Values from the forward pass Returns a tuple of:
- dx: Gradient of input data of shape (N, T, D)
- dh0: Gradient of initial hidden state of shape (N, H)
- dWx: Gradient of input-to-hidden weight matrix of shape (D, 4H)
- dWh: Gradient of hidden-to-hidden weight matrix of shape (H, 4H)
- db: Gradient of biases, of shape (4H,)
"""
dx, dh0, dWx, dWh, db = None, None, None, None, None
#############################################################################
# TODO: Implement the backward pass for an LSTM over an entire timeseries. #
# You should use the lstm_step_backward function that you just defined. #
#############################################################################
N,T,H = dh.shape
_, D = cache[0][4].shape
dx, dh0, dWx, dWh, db = \
[], np.zeros((N, H), dtype='float32'), \
np.zeros((D, 4*H), dtype='float32'), np.zeros((H, 4*H), dtype='float32'), np.zeros(4*H, dtype='float32') step_dprev_h, step_dprev_c = np.zeros((N,H)),np.zeros((N,H))
for i in xrange(T-1, -1, -1):
step_dx, step_dprev_h, step_dprev_c, step_dWx, step_dWh, step_db = \
lstm_step_backward(dh[:,i,:] + step_dprev_h, step_dprev_c, cache[i])
dx.append(step_dx) # 每一个输入节点都有自己的梯度
dWx += step_dWx # 层共享参数,需要累加和
dWh += step_dWh # 层共享参数,需要累加和
db += step_db # 层共享参数,需要累加和
dh0 = step_dprev_h # 只有最初输入的h0,即feature的投影(图像标注中),需要存储梯度
dx = np.array(dx[::-1]).transpose((1,0,2)) return dx, dh0, dWx, dWh, db

『cs231n』作业3问题2选讲_通过代码理解LSTM网络的更多相关文章

  1. 『cs231n』作业3问题1选讲_通过代码理解RNN&图像标注训练

    一份不错的作业3资料(含答案) RNN神经元理解 单个RNN神经元行为 括号中表示的是维度 向前传播 def rnn_step_forward(x, prev_h, Wx, Wh, b): " ...

  2. 『cs231n』作业3问题3选讲_通过代码理解图像梯度

    Saliency Maps 这部分想探究一下 CNN 内部的原理,参考论文 Deep Inside Convolutional Networks: Visualising Image Classifi ...

  3. 『cs231n』作业3问题4选讲_图像梯度应用强化

    [注],本节(上节也是)的model是一个已经训练完成的CNN分类网络. 随机数图片向前传播后对目标类优化,反向优化图片本体 def create_class_visualization(target ...

  4. 『cs231n』作业2选讲_通过代码理解Dropout

    Dropout def dropout_forward(x, dropout_param): p, mode = dropout_param['p'], dropout_param['mode'] i ...

  5. 『cs231n』作业2选讲_通过代码理解优化器

    1).Adagrad一种自适应学习率算法,实现代码如下: cache += dx**2 x += - learning_rate * dx / (np.sqrt(cache) + eps) 这种方法的 ...

  6. 『cs231n』作业1选讲_通过代码理解KNN&交叉验证&SVM

    通过K近邻算法探究numpy向量运算提速 茴香豆的“茴”字有... ... 使用三种计算图片距离的方式实现K近邻算法: 1.最为基础的双循环 2.利用numpy的broadca机制实现单循环 3.利用 ...

  7. 『cs231n』通过代码理解风格迁移

    『cs231n』卷积神经网络的可视化应用 文件目录 vgg16.py import os import numpy as np import tensorflow as tf from downloa ...

  8. 『cs231n』计算机视觉基础

    线性分类器损失函数明细: 『cs231n』线性分类器损失函数 最优化Optimiz部分代码: 1.随机搜索 bestloss = float('inf') # 无穷大 for num in range ...

  9. 『TensorFlow』DCGAN生成动漫人物头像_下

    『TensorFlow』以GAN为例的神经网络类范式 『cs231n』通过代码理解gan网络&tensorflow共享变量机制_上 『TensorFlow』通过代码理解gan网络_中 一.计算 ...

随机推荐

  1. 2018-2019-2 20165209 《网络对抗技术》Exp5:MSF基础应用

    2018-2019-2 20165209 <网络对抗技术>Exp5:MSF基础应用 目录 一.基础问题回答和实验内容 二.攻击实例 主动攻击的实践 ms08_067 payload/gen ...

  2. Linux服务器---ssh登录

    Ssh登录     Ssh是建立在应用层和传输层的安全协议,专门为远程登录回话和其他网络服务提供安全性.利用ssh可以有效的防止远程管理中的信息泄露问题,同时ssh传输的数据是经过压缩的,可以加快传输 ...

  3. Linux服务器---基础设置

    Centos分辨率      virtualbox里新安装的Centos 7 的分辨率默认的应该是800*600. 如果是‘最小化安装’的Centos7 进入的就是命令模式 .如果安装的是带有GUI的 ...

  4. 多线程---ReentrantLock

    package com.test; import java.util.Collection; import java.util.concurrent.locks.Lock; import java.u ...

  5. 冒泡排序法原理讲解及PHP代码示例

    冒泡排序原理 冒泡排序对一个数组里的数字进行排序,把数组里两个相邻的数比较大小,将值小的数放在前面,把大的数往后面放,当然这种排序是升序,即从小到大.举例说明$array = [64, 56, 31, ...

  6. C++面向对象高级开发课程(第三周)

    一,类与类之间的关系:继承(Inheritance).复合(Composition).委托(Delegation). 二,复合:表示 is-a ,该设计思想可以参照C语言的 struct . 1. 例 ...

  7. 车载项目问题解(memset)

    1memset函数解 1.void *memset(void *s,int c,size_t n) 总的作用:将已开辟内存空间 s 的首 n 个字节的值设为值 c.2.例子 #includevoid ...

  8. 在linux桌面上显示图标

    把应用程序的图标添加到桌面上 左图显示了把应用程序的图标添加到桌面上的两种方法,哪种更好看? 想要把应用程序图标添加到桌面上,请先确保已设置了在桌面上显示图标,方法是: 1.安装gnome-tweak ...

  9. aws相关文档

    使用 IAM 角色授予对 Amazon EC2 上的 AWS 资源的访问权 https://docs.aws.amazon.com/zh_cn/sdk-for-java/v1/developer-gu ...

  10. LOJ #10222. 「一本通 6.5 例 4」佳佳的 Fibonacci

    题目链接 题目大意 $$F[i]=F[i-1]+F[i-2]\ (\ F[1]=1\ ,\ F[2]=1\ )$$ $$T[i]=F[1]+2F[2]+3F[3]+...+nF[n]$$ 求$T[n] ...