【CV】CVPR2015_A Discriminative CNN Video Representation for Event Detection
A Discriminative CNN Video Representation for Event Detection
Note here: it's a learning note on the topic of video representation, based on the paper below.
Link: http://arxiv.org/pdf/1411.4006v1.pdf
Motivation:
The use of improved Dense Trajectories (IDT) has led good performance on the task of event detection, while the performance of CNN based video representation is worse than that. The author argues the following three main reasons:
- Lack of labeled video data to train good models.
- Video level event labels are too coarse to finetune a pre-trained model for adapting the event detection task.
- The use of average pooling to generate a discriminative video level representation from CNN frame level descriptors works worse than hand-crafted features like IDT.
Proposed Model:
This paper proposes a model mainly targetting at the third problem, namely how to build a cost-efficient and discrimintive video representation based on CNN.

1) Firstly, we should extract frame-level descriptor:
We adopt M filters in the last convolutional layers as M latent concept classifiers. Each convolutional filter is corresponding to one latent concept, and each of it will apply on different location of the frame. So we’ll get responses of discrimintive latent concepts on different locations of the frame.
After that, we apply max-pooling operation on all concepts descriptors and concatenate different responses at the same location to form vectors each of which containing various concepts descriptions at this location.
By now, we’ve extract frame-level features.
(Actually, they didn’t do anything special at this step, they just give a new illustration of responses in CNN and rearrange those responses for further process.)
2) Secondly, we need to encode a discrimintive video-level descriptor from all these frame-level descriptors:
They introduce and compare three different encoding methods in the paper.
However, as I’m not proficient in the mathematical meanings of them, I can just give a briefly look at them instead of going further.
- Fisher Vector Encoding (refer to: http://blog.csdn.net/breeze5428/article/details/32706507 & http://www.cnblogs.com/CBDoctor/archive/2011/11/06/2236286.html)
- VLAD Encoding (simplified version of Fisher Vector Encoding)
- Average Pooling
Through experiment, they find out VLAD is better than other encoding methods. (You can refer to the paper for details about that experiment.)
"This is the first work on the video pooling of CNN descriptors and we broaden the encoding methods from local descriptors to CNN descriptors in video analysis."
(That's the takeaway in their work. They're the first to apply these encoding methods on the CNN descriptors. Previously, most of the works utilize Fisher Vector Encoding to encode a general feature of an image from local descriptors like HOG, SIFT, HOF and so on.)
3) Lastly, we get a video-level descriptor and feed it into a SVM to do detection task.
Two Tricks:
1) Spatial Pyramid Pooling: they apply four different CNN max-pooling operations to give more spatial locations for a single frame, which makes the descriptor more discrimintive. And that’s also more cost-friendly than applying spatial pyramid on raw frame.
2) Representation Compression: they do Product Quntization to compress the final representation while still maintain or even slightly improve the original performance.
【CV】CVPR2015_A Discriminative CNN Video Representation for Event Detection的更多相关文章
- 论文阅读(Weilin Huang——【TIP2016】Text-Attentional Convolutional Neural Network for Scene Text Detection)
Weilin Huang--[TIP2015]Text-Attentional Convolutional Neural Network for Scene Text Detection) 目录 作者 ...
- 【CV】ICCV2015_Unsupervised Learning of Visual Representations using Videos
Unsupervised Learning of Visual Representations using Videos Note here: it's a learning note on Prof ...
- 【PSMA】Progressive Sample Mining and Representation Learning for One-Shot Re-ID
目录 主要挑战 主要的贡献和创新点 提出的方法 总体框架与算法 Vanilla pseudo label sampling (PLS) PLS with adversarial learning Tr ...
- 【CV】ICCV2015_Learning Temporal Embeddings for Complex Video Analysis
Learning Temporal Embeddings for Complex Video Analysis Note here: it's a review note on novel work ...
- 【CV】ICCV2015_Unsupervised Visual Representation Learning by Context Prediction
Unsupervised Visual Representation Learning by Context Prediction Note here: it's a learning note on ...
- 【CV】ICCV2015_Unsupervised Learning of Spatiotemporally Coherent Metrics
Unsupervised Learning of Spatiotemporally Coherent Metrics Note here: it's a learning note on the to ...
- 【CV】ICCV2015_Describing Videos by Exploiting Temporal Structure
Describing Videos by Exploiting Temporal Structure Note here: it's a learning note on the topic of v ...
- 【ML】ICML2015_Unsupervised Learning of Video Representations using LSTMs
Unsupervised Learning of Video Representations using LSTMs Note here: it's a learning notes on new L ...
- 【实战问题】【3】iPhone无法播放video标签中的视频
问题:视频都是MP4格式,视频可以在手机上正常播放.video标签中的视频在安卓点击可以播放,但在iPhone无法播放 解决方案: 1,视频编码格式问题,具体iPhone手机支持的是哪些格式可见官方的 ...
随机推荐
- Linux下的sysfs与udev的关系是什么?
sysfs sysfs 把连接在系统上的设备和总线组织成为一个分级的文件,它们可以被从用户的空间存取到.简单介绍sysfs文件系统,您可能想知道 sysfs 是怎么认出系统中存在的设备以及应该使用什 ...
- January 28th, 2018 Week 05th Sunday
I wish you all I ever wanted for you, I wish you the best. 我希望你不负我的期望,愿你一切安好. I hope I can live up t ...
- PyQt5--ButtonDrag
# -*- coding:utf-8 -*- ''' Created on Sep 21, 2018 @author: SaShuangYiBing Comment: ''' import sys f ...
- webpack打包去掉console.log打印与debugger调试
如图,找到build/webpack.prod.conf.js 在 UglifyJsPlugin 插件下添加下列代码 drop_debugger: true, drop_console: true
- 前端性能优化成神之路--图片懒加载(lazyload image)
图片懒加载(当然不仅限于图片,还可以有视频,flash)也是一种优化前端性能的方式.使用懒加载可以想要看图片时才加载图片,而不是一次性加载所有的图片,从而在一定程度从减少服务端的请求 什么是懒加载 懒 ...
- openssl交叉编译记录
本次任务是要完毕嵌入式Linux下对openssl程序的支持. 我的开发环境:ARM9开发板 和 嵌入式Linux操作系统.装有Linux系统(我的是ubuntu9.04)的PC机一台.串口和 ...
- P2665 [USACO08FEB]连线游戏Game of Lines
本着dp的心情,结果是道水题STL set就行了 题意:看有多少种斜率(题在那里半天说多少直线(不平行的)),其实就是找斜率的种类 #include<cstdio> #include&l ...
- Github(1) 桌面版使用
桌面版使用 https://www.cnblogs.com/Chenshuai7/p/5486278.html 1安装 我的账号 1051196347@qq.com 密码 L*******4***** ...
- Photoshop 基础一 安装
安装 版本介绍 学习教程 一.安装 1)注册Adobe账号,注册地址:Adobe注册 2)下载地址:Adobe下载 下载地址2:百度经验 3)安装:试用期7天的版本 二.版本介绍 1)最新版本:A ...
- jmeter(十一)JDBC Request之Query Type
工作中遇到这样一个问题: 需要准备10W条测试数据,利用jmeter中的JDBC Request向数据库中批量插入这些数据(只要主键不重复就可以,利用函数助手中的Random将主键的ID末尾五位数随机 ...