【ML】ICLR2016_Delving Deeper into Convolutional Networks

ICLR2016_DELVING DEEPER INTO CONVOLUTIONAL NETWORKS

Note here: Ballas recently proposed a novel framework on learning video representation, following is the review note after reading his paper.

Link: http://arxiv.org/pdf/1511.06432v4.pdf

[Brief introduction to some neural networks]

CNN: excellent in static image classification

RNN: can understand temporal sequences in various learning tasks
(however, with exploding or vanishing weights problem)
---> LSTM/GRU are proposed to avoid this problem

RCN: leverage properties from both CNN and RNN, use CNN top level feature map as input of RNN, it has recently introduced to learn video representations.

[Video reprensentation]

Mmotivation:
Adopt RCN as basic model.
- Top-level feature map presents high sementic features, namely the spatial naunces are ignored after pooling.
- However, frame-to-frame temporal variation is known to be smooth, which is the key for action recognition from videos.
(we need a new model to adapt this problem)

[Proposed models]

GRU-RCN:
- replace recurrent units in RCN with GRU.

(z: activation gate, decides to what degree previous hidden state would contribute to the next hidden state)
(r: reset gate, decides whether or not last hidden state should be propagated into next state)
(~h: candidate hidden state, it'll pass through the activatin gate)
(h: final hidden state)

Problems:
- number of parameters in fully-connected layer is huge due to size of conv map.
- fully-connected layers break the spatial structure of conv map.

Trick:
- replace the fully-connected units in GRU with convolution operations, which can keep spatial structure and reduce number of parameters meanwhile.

Intuition:
- we can see the propagation of hidden states as a process of convolution.
if so, the next hidden state percepts spatial structure of all the previous states. as the sequence goes further, the receptive field on previous states are larger, and we only get a general concept of frames in the beginning.
- compare to our cognition system, it does make sense!

Stacked GRU-RCN:
- it applies L GRU-RCNs independently on each convolutional map.
- tile up L GRU-RCNs.
- feed L final time-step hidden states into a classifier.

【ML】ICLR2016_Delving Deeper into Convolutional Networks的更多相关文章

【ML】Two-Stream Convolutional Networks for Action Recognition in Videos
Two-Stream Convolutional Networks for Action Recognition in Videos & Towards Good Practices for ...
【论文笔记】Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition
Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition 2018-01-28 15:4 ...
【ML】Predict and Constrain: Modeling Cardinality in Deep Structured Prediction -预测和约束：在深度结构化预测中建模基数
[论文标题]Predict and Constrain: Modeling Cardinality in Deep Structured Prediction (35th-ICML,PMLR) [ ...
【网络结构可视化】Visualizing and Understanding Convolutional Networks（ZF-Net）论文解析
目录 0. 论文地址 1. 概述 2. 可视化结构 2.1 Unpooling 2.2 Rectification: 2.3 Filtering: 3. Feature Visualization 4 ...
【转载】卷积神经网络（Convolutional Neural Network，CNN）
作者:wuliytTaotao 出处:https://www.cnblogs.com/wuliytTaotao/ 本作品采用知识共享署名-非商业性使用-相同方式共享 4.0 国际许可协议进行许可,欢迎 ...
【翻译】给初学者的 Neural Networks / 神经网络介绍
本文翻译自 SATYA MALLICK 的 "Neural Networks : A 30,000 Feet View for Beginners" 原文链接: https:// ...
【ML】从特征分解，奇异值分解到主成分分析
1.理解特征值,特征向量一个对角阵$A$,用它做变换时,自然坐标系的坐标轴不会发生旋转变化,而只会发生伸缩,且伸缩的比例就是$A$中对角线对应的数值大小. 对于普通矩阵$A$来说,是不是 ...
【ML】ICML2015_Unsupervised Learning of Video Representations using LSTMs
Unsupervised Learning of Video Representations using LSTMs Note here: it's a learning notes on new L ...
【ML】人脸识别
https://github.com/colipso/face_recognition https://medium.com/@ageitgey/machine-learning-is-fun-par ...

随机推荐

Python 列表(List)包含的函数与方法
Python列表函数&方法 Python包含以下函数: 序号函数 1 cmp(list1, list2)比较两个列表的元素 2 len(list)列表元素个数 3 max(list)返回列表 ...
UUChart的使用
一.简介 UUChart是一个用于绘制图表的第三方,尤其适合去绘制折线图.自己再做一个医院相关的项目时,需要对一周内的血压进行监控,需要绘制折线图来表示出高压.低压的走向,因此学习了一下. 二.下载地 ...
ConcurrentLinkedQueue源码解读
1.简介 ConcurrentLinkedQueue是JUC中的基于链表的无锁队列实现.本文将解读其源码实现. 2. 论文 ConcurrentLinkedQueue的实现是以Maged M. Mic ...
dispatchTouchEvent
View /** * Pass the touch screen motion event down to the target view, or this * view if it is the ...
css权重 vs 浏览器渲染 -- css之弊病
昨日,突现一个bug,令人十分恼火. 基本场景自己实现一多选日历,可多选多天(相连或不相连均可)."贵司"的需求真心有些小复杂了,"市面"上没有这样的相似的东 ...
Excel中concatenate函数的使用方法
你还在为Excel中concatenate函数的使用方法而苦恼吗,今天小编教你Excel中concatenate函数的使用方法,让你告别Excel中concatenate函数的使用方法的烦恼. 经验主 ...
接上篇，php生成静态页面，加上页面时间缓存
<?php require_once(dirname(__FILE__).'/include/config.inc.php'); ?> <?php $dosql->Execut ...
（四）天猫精灵接入Home Assistant-ESP-WIFI模块通过mqtt协议接入HASS
总过程 1 ESP8266上电后,初始化连接MQTT服务器发布自身配置信息----hass自动发现该设备订阅hass的命令话题---接收命令发布hass的状态话题---返回自身状态 2 ESP ...
什么是CSS盒模型及利用CSS对HTML元素进行定位的实现（含h5/css3新增属性）
大家好,很高兴又跟大家见面了!本周更新博主将给大家带来更精彩的HTML5技术分享,通过本周的学习,可实现大部分的网页制作.以下为本次更新内容. 第四章 css盒模型 <!DOCTYPE html ...
深入浅出的webpack4构建工具---浏览器前端资源缓存(十一)
阅读目录一. 理解使用hash 二:理解使用chunkhash 三:对第三方库打包后使用缓存四:contenthash 回到顶部一. 理解使用hash 一般情况下,对于前端静态资源,浏览器访问的 ...

【ML】ICLR2016_Delving Deeper into Convolutional Networks

【ML】ICLR2016_Delving Deeper into Convolutional Networks的更多相关文章

随机推荐

热门专题