【ML】ICLR2016_Delving Deeper into Convolutional Networks
ICLR2016_DELVING DEEPER INTO CONVOLUTIONAL NETWORKS
Note here: Ballas recently proposed a novel framework on learning video representation, following is the review note after reading his paper.
Link: http://arxiv.org/pdf/1511.06432v4.pdf
[Brief introduction to some neural networks]
CNN: excellent in static image classification
RNN: can understand temporal sequences in various learning tasks
(however, with exploding or vanishing weights problem)
---> LSTM/GRU are proposed to avoid this problem
RCN: leverage properties from both CNN and RNN, use CNN top level feature map as input of RNN, it has recently introduced to learn video representations.
[Video reprensentation]
Mmotivation:
Adopt RCN as basic model.
- Top-level feature map presents high sementic features, namely the spatial naunces are ignored after pooling.
- However, frame-to-frame temporal variation is known to be smooth, which is the key for action recognition from videos.
(we need a new model to adapt this problem)
[Proposed models]
GRU-RCN:
- replace recurrent units in RCN with GRU.

(z: activation gate, decides to what degree previous hidden state would contribute to the next hidden state)
(r: reset gate, decides whether or not last hidden state should be propagated into next state)
(~h: candidate hidden state, it'll pass through the activatin gate)
(h: final hidden state)

Problems:
- number of parameters in fully-connected layer is huge due to size of conv map.
- fully-connected layers break the spatial structure of conv map.
Trick:
- replace the fully-connected units in GRU with convolution operations, which can keep spatial structure and reduce number of parameters meanwhile.
Intuition:
- we can see the propagation of hidden states as a process of convolution.
if so, the next hidden state percepts spatial structure of all the previous states. as the sequence goes further, the receptive field on previous states are larger, and we only get a general concept of frames in the beginning.
- compare to our cognition system, it does make sense!
Stacked GRU-RCN:
- it applies L GRU-RCNs independently on each convolutional map.
- tile up L GRU-RCNs.
- feed L final time-step hidden states into a classifier.


【ML】ICLR2016_Delving Deeper into Convolutional Networks的更多相关文章
- 【ML】Two-Stream Convolutional Networks for Action Recognition in Videos
Two-Stream Convolutional Networks for Action Recognition in Videos & Towards Good Practices for ...
- 【论文笔记】Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition
Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition 2018-01-28 15:4 ...
- 【ML】Predict and Constrain: Modeling Cardinality in Deep Structured Prediction -预测和约束:在深度结构化预测中建模基数
[论文标题]Predict and Constrain: Modeling Cardinality in Deep Structured Prediction (35th-ICML,PMLR) [ ...
- 【网络结构可视化】Visualizing and Understanding Convolutional Networks(ZF-Net) 论文解析
目录 0. 论文地址 1. 概述 2. 可视化结构 2.1 Unpooling 2.2 Rectification: 2.3 Filtering: 3. Feature Visualization 4 ...
- 【转载】 卷积神经网络(Convolutional Neural Network,CNN)
作者:wuliytTaotao 出处:https://www.cnblogs.com/wuliytTaotao/ 本作品采用知识共享署名-非商业性使用-相同方式共享 4.0 国际许可协议进行许可,欢迎 ...
- 【翻译】给初学者的 Neural Networks / 神经网络 介绍
本文翻译自 SATYA MALLICK 的 "Neural Networks : A 30,000 Feet View for Beginners" 原文链接: https:// ...
- 【ML】从特征分解,奇异值分解到主成分分析
1.理解特征值,特征向量 一个对角阵\(A\),用它做变换时,自然坐标系的坐标轴不会发生旋转变化,而只会发生伸缩,且伸缩的比例就是\(A\)中对角线对应的数值大小. 对于普通矩阵\(A\)来说,是不是 ...
- 【ML】ICML2015_Unsupervised Learning of Video Representations using LSTMs
Unsupervised Learning of Video Representations using LSTMs Note here: it's a learning notes on new L ...
- 【ML】人脸识别
https://github.com/colipso/face_recognition https://medium.com/@ageitgey/machine-learning-is-fun-par ...
随机推荐
- Centos7系统下编写systemd脚本设置redis开机自启动
今天想设置redis开机自启动,我觉得这样子比较好,但是在网上找了很长时间发现大家都是基于chkconfig的写法来设置的,并不能由systemd进程来统一管理,所以这里我自己编写了一个,希望大家可以 ...
- Mysql基础之 binary关键字
where子句的字符串比较是不区分大小写的,但是可以使用binary关键字设定where子句区分大小写
- 03LaTeX学习系列之---TeXworks的使用
目录 03TeXworks的使用 目录 前言 (一)Texworks的认识 1.TeXworks的安装 2.TeXworks的优点 3.TeXworks的界面 (二)Texworks的编译与查看 1. ...
- 1024. Video Stitching
//使用java dfs public int videoStitching(int[][] clips, int T) { //bfs Queue<Integer> queue = ne ...
- M600 (1)飞行注意事项
- 20145236《网络对抗》Exp8 WEB基础实践
20145236<网路对抗>Exp8 WEB基础实践 一.基础问题回答 什么是表单 表单在网页中主要负责数据采集功能 一个表单有三个基本组成部分: 表单标签 表单域:包含了文本框.密码框. ...
- Qt warning: 构建目录必须和源文件目录为同级目录
从一台电脑转移项目到另一电脑, 路径发生变化,重新构建运行时会出现“QT 构建目录必须和源目录为同级目录”提示,解决办法很加单.如下: 方法一: 点击 project(项目)->然后,看看Bui ...
- MATLAB——BP神经网络
1.使用误差反向传播(error back propagation )的网络就叫BP神经网络 2.BP网络的特点: 1)网络由多层构成,层与层之间全连接,同一层之间的神经元无连接 . 2)BP网络的传 ...
- sonar6.7.6安装及汉化
sonar下载地址 https://www.sonarqube.org/downloads/ 下载请选择 然后解压 在目录F:\tools\sonarqube-6.7.6\bin\windows-x8 ...
- luogu P2706 巧克力
题目 (第一道绿题) 有点像最大子矩阵qwq 用前缀和存图,l,r代表横向的一段区间,区间和就是a[r]-a[l-1] 然后用一个k从上到下dp...因为每次l,r变化的时候原来的k就没有用了,所以k ...