A Discriminative CNN Video Representation for Event Detection

Note here: it's a learning note on the topic of video representation, based on the paper below.

Link: http://arxiv.org/pdf/1411.4006v1.pdf

Motivation:

The use of improved Dense Trajectories (IDT) has led good performance on the task of event detection, while the performance of CNN based video representation is worse than that. The author argues the following three main reasons:

  • Lack of labeled video data to train good models.
  • Video level event labels are too coarse to finetune a pre-trained model for adapting the event detection task.
  • The use of average pooling to generate a discriminative video level representation from CNN frame level descriptors works worse than hand-crafted features like IDT.

Proposed Model:

This paper proposes a model mainly targetting at the third problem, namely how to build a cost-efficient and discrimintive video representation based on CNN.

1)     Firstly, we should extract frame-level descriptor:

We adopt M filters in the last convolutional layers as M latent concept classifiers. Each convolutional filter is corresponding to one latent concept, and each of it will apply on different location of the frame. So we’ll get responses of discrimintive latent concepts on different locations of the frame.

After that, we apply max-pooling operation on all concepts descriptors and concatenate different responses at the same location to form vectors each of which containing various concepts descriptions at this location.

By now, we’ve extract frame-level features.

(Actually, they didn’t do anything special at this step, they just give a new illustration of responses in CNN and rearrange those responses for further process.)

2)     Secondly, we need to encode a discrimintive video-level descriptor from all these frame-level descriptors:

They introduce and compare three different encoding methods in the paper.

However, as I’m not proficient in the mathematical meanings of them, I can just give a briefly look at them instead of going further.

Through experiment, they find out VLAD is better than other encoding methods. (You can refer to the paper for details about that experiment.)

"This is the first work on the video pooling of CNN descriptors and we broaden the encoding methods from local descriptors to CNN descriptors in video analysis."

(That's the takeaway in their work. They're the first to apply these encoding methods on the CNN descriptors. Previously, most of the works utilize Fisher Vector Encoding to encode a general feature of an image from local descriptors like HOG, SIFT, HOF and so on.)

3)     Lastly, we get a video-level descriptor and feed it into a SVM to do detection task.

Two Tricks:

1)     Spatial Pyramid Pooling: they apply four different CNN max-pooling operations to give more spatial locations for a single frame, which makes the descriptor more discrimintive. And that’s also more cost-friendly than applying spatial pyramid on raw frame.

2)     Representation Compression: they do Product Quntization to compress the final representation while still maintain or even slightly improve the original performance.

【CV】CVPR2015_A Discriminative CNN Video Representation for Event Detection的更多相关文章

  1. 论文阅读(Weilin Huang——【TIP2016】Text-Attentional Convolutional Neural Network for Scene Text Detection)

    Weilin Huang--[TIP2015]Text-Attentional Convolutional Neural Network for Scene Text Detection) 目录 作者 ...

  2. 【CV】ICCV2015_Unsupervised Learning of Visual Representations using Videos

    Unsupervised Learning of Visual Representations using Videos Note here: it's a learning note on Prof ...

  3. 【PSMA】Progressive Sample Mining and Representation Learning for One-Shot Re-ID

    目录 主要挑战 主要的贡献和创新点 提出的方法 总体框架与算法 Vanilla pseudo label sampling (PLS) PLS with adversarial learning Tr ...

  4. 【CV】ICCV2015_Learning Temporal Embeddings for Complex Video Analysis

    Learning Temporal Embeddings for Complex Video Analysis Note here: it's a review note on novel work ...

  5. 【CV】ICCV2015_Unsupervised Visual Representation Learning by Context Prediction

    Unsupervised Visual Representation Learning by Context Prediction Note here: it's a learning note on ...

  6. 【CV】ICCV2015_Unsupervised Learning of Spatiotemporally Coherent Metrics

    Unsupervised Learning of Spatiotemporally Coherent Metrics Note here: it's a learning note on the to ...

  7. 【CV】ICCV2015_Describing Videos by Exploiting Temporal Structure

    Describing Videos by Exploiting Temporal Structure Note here: it's a learning note on the topic of v ...

  8. 【ML】ICML2015_Unsupervised Learning of Video Representations using LSTMs

    Unsupervised Learning of Video Representations using LSTMs Note here: it's a learning notes on new L ...

  9. 【实战问题】【3】iPhone无法播放video标签中的视频

    问题:视频都是MP4格式,视频可以在手机上正常播放.video标签中的视频在安卓点击可以播放,但在iPhone无法播放 解决方案: 1,视频编码格式问题,具体iPhone手机支持的是哪些格式可见官方的 ...

随机推荐

  1. Python基础知识:while循环

    1.在循环中使用continue输出1-10之间的奇数 num=0 while num <10: num += 1 if num %2 == 0: #--%--运算符,相除返回余数 contin ...

  2. WPFのclipToBounds与maskToBounds的区别

    UIView.clipsToBounds : 让子 View 只显示父 View 的 Frame 部分,子视图超出frame的部分不显示,默认为NO,设置为YES就会把超出的部分裁掉: maskToB ...

  3. centos7下安装docker(9容器对资源的使用限制-内存)

                  一个docker Host上面会运行若干容器,每个容器都需要CPU,内存和IO资源.容器提供了控制分配多少CPU,内存给每个容器的机制,避免摸个容器因占用太多资源而影响其他 ...

  4. 基于Redis的INCR实现一个限流器

    模式:计数器 计数器是 Redis 的原子性自增操作可实现的最直观的模式了,它的想法相当简单:每当某个操作发生时,向 Redis 发送一个 INCR 命令. 比如在一个 web 应用程序中,如果想知道 ...

  5. 不可不知 DDoS的攻击原理与防御方法

    DoS攻击.DDoS攻击和DRDoS攻击相信大家已经早有耳闻了吧!DoS是Denial of Service的简写就是拒绝服务,而DDoS就是Distributed Denial of Service ...

  6. oracle 查询 磁盘使用率

    SELECT d.tablespace_name "Name",        TO_CHAR(NVL(a.bytes / 1024 / 1024 / 1024, 0), '99, ...

  7. day03---基本数据类型、运算符、与用户交互

    day03 基本数据类型: 数据类型指的是变量值的类型,变量值之所以区分类型,是因为变量值是用来记录一种事物的状态,而不同的事物有不同的事物状态,对应着也必须必须用不同的变量类型去衡量. 分类: 数字 ...

  8. linux终端神器kmux

    文章链接 https://www.cnblogs.com/rond/p/4466599.html http://cenalulu.github.io/linux/tmux/ https://www.c ...

  9. 分布式缓存技术redis系列(三)——redis高级应用(主从、事务与锁、持久化)

    上文<详细讲解redis数据结构(内存模型)以及常用命令>介绍了redis的数据类型以及常用命令,本文我们来学习下redis的一些高级特性. 安全性设置 设置客户端操作秘密 redis安装 ...

  10. vscode中iframe的使用

    vscode中iframe由于某种原因不支持本地页面,比如<iframe src ="index.html" width="300" height=&qu ...