LSTM神经元行为分析

LSTM 公式可以描述如下:

itftotgtctht=sigmoid(Wixxt+Wihht−1+bi)=sigmoid(Wfxxt+Wfhht−1+bf)=sigmoid(Woxxt+Wohht−1+bo)=tanh(Wgxxt+Wghht−1+bg)=ft∘ct−1+it∘gt=ot∘ct

感觉比较新奇的一点是通过点乘矩阵使用‘门’控制数据流的取舍,和卷积神经网络的激活过程有一点点相似。

反向传播时,通过链式法则一个变量一个变量后推比较清晰。

反向传播时注意Ct节点,它既是本层的输出,也是本层另一个输出ht的输入节点,即它的梯度由两部分组成——上层回传梯度&ht反向传播梯度

向前传播

单个LSTM神经元向前传播

def lstm_step_forward(x, prev_h, prev_c, Wx, Wh, b):
"""
Forward pass for a single timestep of an LSTM. The input data has dimension D, the hidden state has dimension H, and we use
a minibatch size of N. Inputs:
- x: Input data, of shape (N, D)
- prev_h: Previous hidden state, of shape (N, H)
- prev_c: previous cell state, of shape (N, H)
- Wx: Input-to-hidden weights, of shape (D, 4H)
- Wh: Hidden-to-hidden weights, of shape (H, 4H)
- b: Biases, of shape (4H,) Returns a tuple of:
- next_h: Next hidden state, of shape (N, H)
- next_c: Next cell state, of shape (N, H)
- cache: Tuple of values needed for backward pass.
"""
next_h, next_c, cache = None, None, None
#############################################################################
# TODO: Implement the forward pass for a single timestep of an LSTM. #
# You may want to use the numerically stable sigmoid implementation above. #
#############################################################################
_, H = prev_h.shape
a = x.dot(Wx) + prev_h.dot(Wh) + b
i,f,o,g = sigmoid(a[:,:H]),sigmoid(a[:,H:2*H]),sigmoid(a[:,2*H:3*H]),np.tanh(a[:,3*H:])
next_c = f*prev_c + i*g
next_h = o*np.tanh(next_c)
cache = [i, f, o, g, x, prev_h, prev_c, Wx, Wh, b, next_c] return next_h, next_c, cache

层LSTM神经元向前传播

def lstm_forward(x, h0, Wx, Wh, b):
"""
Forward pass for an LSTM over an entire sequence of data. We assume an input
sequence composed of T vectors, each of dimension D. The LSTM uses a hidden
size of H, and we work over a minibatch containing N sequences. After running
the LSTM forward, we return the hidden states for all timesteps. Note that the initial cell state is passed as input, but the initial cell
state is set to zero. Also note that the cell state is not returned; it is
an internal variable to the LSTM and is not accessed from outside. Inputs:
- x: Input data of shape (N, T, D)
- h0: Initial hidden state of shape (N, H)
- Wx: Weights for input-to-hidden connections, of shape (D, 4H)
- Wh: Weights for hidden-to-hidden connections, of shape (H, 4H)
- b: Biases of shape (4H,) Returns a tuple of:
- h: Hidden states for all timesteps of all sequences, of shape (N, T, H)
- cache: Values needed for the backward pass.
"""
h, cache = None, None
#############################################################################
# TODO: Implement the forward pass for an LSTM over an entire timeseries. #
# You should use the lstm_step_forward function that you just defined. #
#############################################################################
N,T,D = x.shape
next_c = np.zeros_like(h0)
next_h = h0
h, cache = [], []
for i in range(T):
next_h, next_c, cache_step = lstm_step_forward(x[:,i,:], next_h, next_c, Wx, Wh, b)
h.append(next_h)
cache.append(cache_step)
h = np.array(h).transpose(1,0,2) #<-----------注意分析h存储后的维度是(T,N,H),需要转置为(N,T,H) return h, cache

反向传播

注意实际反向传播时,初始的C梯度是自己初始化的,而h梯度继承自高层(分类或者h到词袋的转化层,h层和RNN实际相同)

单个LSTM神经元反向传播

def lstm_step_backward(dnext_h, dnext_c, cache):
"""
Backward pass for a single timestep of an LSTM. Inputs:
- dnext_h: Gradients of next hidden state, of shape (N, H)
- dnext_c: Gradients of next cell state, of shape (N, H)
- cache: Values from the forward pass Returns a tuple of:
- dx: Gradient of input data, of shape (N, D)
- dprev_h: Gradient of previous hidden state, of shape (N, H)
- dprev_c: Gradient of previous cell state, of shape (N, H)
- dWx: Gradient of input-to-hidden weights, of shape (D, 4H)
- dWh: Gradient of hidden-to-hidden weights, of shape (H, 4H)
- db: Gradient of biases, of shape (4H,)
"""
dx, dprev_h, dprev_c, dWx, dWh, db = None, None, None, None, None, None
#############################################################################
# TODO: Implement the backward pass for a single timestep of an LSTM. #
# #
# HINT: For sigmoid and tanh you can compute local derivatives in terms of #
# the output value from the nonlinearity. #
#############################################################################
i, f, o, g, x, prev_h, prev_c, Wx, Wh, b, next_c = cache do = dnext_h*np.tanh(next_c)
dnext_c += dnext_h*o*(1-np.tanh(next_c)**2) #<-----------上面分析行为有提到这里的求法 di, df, dg, dprev_c = (g, prev_c, i, f) * dnext_c
da = np.concatenate([i*(1-i)*di, f*(1-f)*df, o*(1-o)*do, (1-g**2)*dg],axis=1) db = np.sum(da,axis=0)
dx, dWx, dprev_h, dWh = (da.dot(Wx.T), x.T.dot(da), da.dot(Wh.T), prev_h.T.dot(da)) return dx, dprev_h, dprev_c, dWx, dWh, db

层LSTM神经元反向传播

def lstm_backward(dh, cache):
"""
Backward pass for an LSTM over an entire sequence of data.] Inputs:
- dh: Upstream gradients of hidden states, of shape (N, T, H)
- cache: Values from the forward pass Returns a tuple of:
- dx: Gradient of input data of shape (N, T, D)
- dh0: Gradient of initial hidden state of shape (N, H)
- dWx: Gradient of input-to-hidden weight matrix of shape (D, 4H)
- dWh: Gradient of hidden-to-hidden weight matrix of shape (H, 4H)
- db: Gradient of biases, of shape (4H,)
"""
dx, dh0, dWx, dWh, db = None, None, None, None, None
#############################################################################
# TODO: Implement the backward pass for an LSTM over an entire timeseries. #
# You should use the lstm_step_backward function that you just defined. #
#############################################################################
N,T,H = dh.shape
_, D = cache[0][4].shape
dx, dh0, dWx, dWh, db = \
[], np.zeros((N, H), dtype='float32'), \
np.zeros((D, 4*H), dtype='float32'), np.zeros((H, 4*H), dtype='float32'), np.zeros(4*H, dtype='float32') step_dprev_h, step_dprev_c = np.zeros((N,H)),np.zeros((N,H))
for i in xrange(T-1, -1, -1):
step_dx, step_dprev_h, step_dprev_c, step_dWx, step_dWh, step_db = \
lstm_step_backward(dh[:,i,:] + step_dprev_h, step_dprev_c, cache[i])
dx.append(step_dx) # 每一个输入节点都有自己的梯度
dWx += step_dWx # 层共享参数,需要累加和
dWh += step_dWh # 层共享参数,需要累加和
db += step_db # 层共享参数,需要累加和
dh0 = step_dprev_h # 只有最初输入的h0,即feature的投影(图像标注中),需要存储梯度
dx = np.array(dx[::-1]).transpose((1,0,2)) return dx, dh0, dWx, dWh, db

『cs231n』作业3问题2选讲_通过代码理解LSTM网络的更多相关文章

  1. 『cs231n』作业3问题1选讲_通过代码理解RNN&图像标注训练

    一份不错的作业3资料(含答案) RNN神经元理解 单个RNN神经元行为 括号中表示的是维度 向前传播 def rnn_step_forward(x, prev_h, Wx, Wh, b): " ...

  2. 『cs231n』作业3问题3选讲_通过代码理解图像梯度

    Saliency Maps 这部分想探究一下 CNN 内部的原理,参考论文 Deep Inside Convolutional Networks: Visualising Image Classifi ...

  3. 『cs231n』作业3问题4选讲_图像梯度应用强化

    [注],本节(上节也是)的model是一个已经训练完成的CNN分类网络. 随机数图片向前传播后对目标类优化,反向优化图片本体 def create_class_visualization(target ...

  4. 『cs231n』作业2选讲_通过代码理解Dropout

    Dropout def dropout_forward(x, dropout_param): p, mode = dropout_param['p'], dropout_param['mode'] i ...

  5. 『cs231n』作业2选讲_通过代码理解优化器

    1).Adagrad一种自适应学习率算法,实现代码如下: cache += dx**2 x += - learning_rate * dx / (np.sqrt(cache) + eps) 这种方法的 ...

  6. 『cs231n』作业1选讲_通过代码理解KNN&交叉验证&SVM

    通过K近邻算法探究numpy向量运算提速 茴香豆的“茴”字有... ... 使用三种计算图片距离的方式实现K近邻算法: 1.最为基础的双循环 2.利用numpy的broadca机制实现单循环 3.利用 ...

  7. 『cs231n』通过代码理解风格迁移

    『cs231n』卷积神经网络的可视化应用 文件目录 vgg16.py import os import numpy as np import tensorflow as tf from downloa ...

  8. 『cs231n』计算机视觉基础

    线性分类器损失函数明细: 『cs231n』线性分类器损失函数 最优化Optimiz部分代码: 1.随机搜索 bestloss = float('inf') # 无穷大 for num in range ...

  9. 『TensorFlow』DCGAN生成动漫人物头像_下

    『TensorFlow』以GAN为例的神经网络类范式 『cs231n』通过代码理解gan网络&tensorflow共享变量机制_上 『TensorFlow』通过代码理解gan网络_中 一.计算 ...

随机推荐

  1. 【运维技术】JENKINS管道部署容器化初探

    目标服务器安装docker参考官方文档 https://docs.docker.com/install/linux/docker-ce/centos/ (可选)在目标服务器上安装docker私服 ht ...

  2. ELK学习笔记之Elasticsearch启动常见错误

    问题出现的环境: OS版本:CentOS-7-x86_64-Minimal-1708 ES版本:elasticsearch-6.2.2 1. max file descriptors [4096] f ...

  3. Python descriptor 以及 内置property()函数

    Python Descriptor  1, Python Descriptor是这样一个对象 它按照descriptor协议, 有这样的属性之一 def __get__(self, obj, type ...

  4. 散列表(HashTable)

    散列表 i. 散列函数 i. 冲突解决 ii. 分离链表法 ii. 开放地址法 iii. 线性探测法 iii. 平方探测法 iii. 双散列 ii. 再散列 ii. 可扩散列 i. 装填因子:元素个数 ...

  5. 20145101 《Java程序设计》第7周学习总结

    20145101<Java程序设计>第7周学习总结 教材学习内容总结 第十二章 Lambda Lambda表达式中this的参考对象以及toString()的接受者,是来自Lambda的周 ...

  6. AP与CP介绍【转】

    本文转载子:https://blog.csdn.net/wqlinf/article/details/8663170 基带芯片加协处理器(CP,通常是多媒体加速器).这类产品以MTK方案为典型代表,M ...

  7. linux下查找指定后缀的文件

    1.linux下查找指定后缀的文件 例如查找当前目录下的所有后缀名时.c或.h的文件 find  .  -type f -regex  ".*\.\(c\|h\)"

  8. ubuntu下交叉编译mono

    环境:ubuntu16.04 wget download.mono-project.com/sources/mono/mono-4.8.1.0.tar.bz2 配置: CC=arm-linux-you ...

  9. Linux 操作 mysql

    linux mysql 操作命令 [转 来源] 1.linux下启动mysql的命令:mysqladmin start/ect/init.d/mysql start (前面为mysql的安装路径) 2 ...

  10. 1、webpack编译打包Sass编译的css进js文件

    cnpm install css-loader --save-dev    //css-loader 是将css打包进js cnpm install style-loader --save-dev   ...