Two-Stream Convolutional Networks for Action Recognition in Videos

&

Towards Good Practices for Very Deep Two-Stream ConvNets

Note here: it's a learning note on the topic of video representations. This note incorporates two papers about popular two-stream architecture.

Link: http://arxiv.org/pdf/1406.2199v2.pdf

http://arxiv.org/pdf/1507.02159v1.pdf

Motivation: CNN has significantly boosted the performance of object recognition in still images. However, the use of it for video recognition with stacked frames doesn’t outperform the one with individual frame (work by Karpathy), which indicates traditional way of adapting CNN to video clips doesn’t capture the motion well.

Proposed Model:

In order to learn the spatio-temporal features well, this paper proposed a two-stream architecture for video recognition. It passes the spatial information (single static RGB frame) and another temporal information (optical flow of multiple frames) through the ConvNet. Then fuse the parallel outputs of two streams to form the final class score fusion.

The overall pipeline is shown below:

-      ConvNet input configurations:

There are some options for the input of temporal stream. The author discussed about utilizing optical flow stacking and trajectory stacking as motional information. The former one considers displacements of each point between consecutive frames, while the latter one focuses on the displacements of every point in the initial frame throughout the entire sequences.

They also mentioned bi-directional optical flow to enhance the capacity of video representations; and mean flow subtraction to avoid the influences of camera motion.

Visualization:

The visualization of filters in this architecture is shown below.

Each column corresponds to a filter, each row – to an input channel.

As we can draw from the image, one single filter composed with half black and half white means to compute spatial derivative; and the filters in a column with black turning into white gradually means to compute temporal derivative.

With the intuition above, we can see how the two-stream architecture captures the spatio-temporal features well.

Improvements:

There is another paper named Towards Good Practices for Very Deep Two-Stream ConvNets, which improves the efficiency of two-stream model in practice.

They argue that previous two-stream model didn’t significantly outperform other hand-crafted features for the mainly two reasons: first, the network is not deep enough as VGGNet&GoogLeNet; second, the lack of plenty training data limits its performance.

Thus, they proposed some suggestions to learn a more powerful two-stream model:

-      Pre-training for Two-stream ConvNets: pre-train both spatial and temporal nets on ImageNet.

-      Smaller Learning Rate.

-      More Data Augmentation Techniques

-      High Dropout Ratio: make the training of deep network with small amount of data easier.

-      Multi-GPU training.

【ML】Two-Stream Convolutional Networks for Action Recognition in Videos的更多相关文章

  1. 【CV论文阅读】Two stream convolutional Networks for action recognition in Vedios

    论文的三个贡献 (1)提出了two-stream结构的CNN,由空间和时间两个维度的网络组成. (2)使用多帧的密集光流场作为训练输入,可以提取动作的信息. (3)利用了多任务训练的方法把两个数据集联 ...

  2. 【ML】ICLR2016_Delving Deeper into Convolutional Networks

    ICLR2016_DELVING DEEPER INTO CONVOLUTIONAL NETWORKS Note here: Ballas recently proposed a novel fram ...

  3. 目标检测--Spatial pyramid pooling in deep convolutional networks for visual recognition(PAMI, 2015)

    Spatial pyramid pooling in deep convolutional networks for visual recognition 作者: Kaiming He, Xiangy ...

  4. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

    Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition Kaiming He, Xiangyu Zh ...

  5. SPPNet论文翻译-空间金字塔池化Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

    http://www.dengfanxin.cn/?p=403 原文地址 我对物体检测的一篇重要著作SPPNet的论文的主要部分进行了翻译工作.SPPNet的初衷非常明晰,就是希望网络对输入的尺寸更加 ...

  6. 深度学习论文翻译解析(九):Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

    论文标题:Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition 标题翻译:用于视觉识别的深度卷积神 ...

  7. 【Semantic Segmentation】 Instance-sensitive Fully Convolutional Networks论文解析(转)

    这篇文章比较简单,但还是不想写overview,转自: https://blog.csdn.net/zimenglan_sysu/article/details/52451098 另外,读这篇pape ...

  8. 【注意力机制】Attention Augmented Convolutional Networks

    注意力机制之Attention Augmented Convolutional Networks 原始链接:https://www.yuque.com/lart/papers/aaconv 核心内容 ...

  9. 【ML】Predict and Constrain: Modeling Cardinality in Deep Structured Prediction -预测和约束:在深度结构化预测中建模基数

    [论文标题]Predict and Constrain: Modeling Cardinality in Deep Structured Prediction   (35th-ICML,PMLR) [ ...

随机推荐

  1. tidb在DDL语句方面的测试

    Mysql与tidb测试数据为8000万行. 1.修改一个字段的列名,比如将“ctime”修改为“cctime”. Tidb测试: MySQL测试: 2.同一属性之间切换,即修改一个字段的属性大小.比 ...

  2. ccf题库中2015年12月2号消除类游戏

    题目如下: 问题描述 消除类游戏是深受大众欢迎的一种游戏,游戏在一个包含有n行m列的游戏棋盘上进行,棋盘的每一行每一列的方格上放着一个有颜色的棋子,当一行或一列上有连续三个或更多的相同颜色的棋子时,这 ...

  3. 理解OSI参考模型

    在一个视频网站上不小心搜到网络知识的视频,突然以前大学的没有真正接受的知识点,一下子豁然开朗,赶紧整理了下笔记. 一.OSI参考模型 自下而上:物理层(物理介质,比特流).数据链路层(网卡.交换机). ...

  4. Tronado自定义Form组件

    Tronado自定义Form组件 一.获取类里面的静态属性以及动态属性的方法 方式一: # ===========方式一================ class Foo(object): user ...

  5. [题目] Luogu P5038 [SCOI2012]奇怪的游戏

    学习资料 -----1----- -----2----- P5038 [SCOI2012]奇怪的游戏 一道甚神但没用到高深模型的题 思路 没思路,看题解吧 代码 #include <iostre ...

  6. 2190: [SDOI2008]仪仗队

    Description 作为体育委员,C君负责这次运动会仪仗队的训练.仪仗队是由学生组成的N * N的方阵,为了保证队伍在行进中整齐划一,C君会跟在仪仗队的左后方,根据其视线所及的学生人数来判断队伍是 ...

  7. javascript高级选择器querySelector和querySelectorAll

    querySelector 和 querySelectorAll 方法是 W3C Selectors API规范中定义的.他们的作用是根据 CSS 选择器规范,便捷定位文档中指定元素. 目前几乎主流浏 ...

  8. .Net修改网站项目调试时的虚拟目录(未验证)

    有些项目需要在IIS发布的时候,将网站发布到虚拟目录,为了保持调试和发布的路径同一,一般会修改VS调试的虚拟目录 一.Web应用程序 Web应用程序的修改方式非常简单,在解决方案资源管理器->项 ...

  9. docker 仓库搭建

    阿里云服务器: 127.0.0.1(客户端) 127.0.0.2(私有服务器) 127.0.0.2作为私有仓库使用 1.下载镜像 [root@insure ~]# docker pull regist ...

  10. Python脱产8期 Day03 2019/4/15

    一 变量的命名规范 1.只能由 字母, 数字,  _, 组成. 2. 不能以数字开头 3.避免与系统关键字重名:重名不会报错,但系统的功能就被自定义的功能屏蔽掉了(严重不建议这样来做) 4.以_开头的 ...