Learning Temporal Embeddings for Complex Video Analysis

Note here: it's a review note on novel work from Feifei-Li's group about video representations, published on ICCV2015.

Link: http://www.cv-foundation.org/openaccess/content_iccv_2015/html/Ramanathan_Learning_Temporal_Embeddings_ICCV_2015_paper.html

Motivation:

- Labeled video data is short for learning video representations, we need an unsupervised way.

- Context(temporal structure) is significant for video representations.

Proposed model:

- give one query frame, we can predict corresponding context representations(embeddings) of it through this model.

- Pipline:

\(f_{vj}(s_{vj};w_{e})\): embedding function

(\(W_{e}\) is the only parameter here we need to train for)

- Training:

\(h_{vj}=\frac{1}{2T}\sum_{t=1}^T(f_{vj+t}+f_{vj-t})\): context vector

Unsupervised learning objective (SVM Loss):

\(J(W_{e})=\sum_{v\in V}\sum_{S_{vj\in V},S\neq S_{vj}}max(0,1-(f_{vj}-f_{\_})\cdot h_{vj})\)

(\(f_{vj}\) is the embedding of frame \(S_{vj}\))

(\(f_{\_}\) is a negative frame which is not highly relevant to \(S_{vj}\))

(\(h_{vj}\) is the context embedding of frame \(S_{vj}\))

We’ll go further into the choosing of negative frames and context range later.

Intuition:

This model momorizes the context of specific frame. It utilizes the spatial appearance of the frame to form an embedding vector, which infers its context information.

Spatial feature learned from CNN \(\xrightarrow{\;\;\;W_{e}\;\;projection\;\;\;}\) Temporal feature embeds context

(\(W_{e}\) memorizes the temporal pattern during training)

With the temporal structure, even though some frames are not appearance similar, they can also be near in the feature space as long as they share similar context. Like following:

There’re two takeaways in the training process:

- Multi-resolution sampling: it’s hard to decide a generic context range(T), for videos own different paces, some may be quick while some are slow. This paper proposed a multi-resolution sampling strategy, instead of only sampling the context with same frame gap, it sampling with various gap lengths. That’s a trade-off between semantic relatedness and visual variaty.

- Hard Negative: choosing of negative samples are important for a robust model. It’s natural to come up with sampling negative frames in other videos and context frames from the same video, but this may cause the model overfit for some video-specific, less sementic properties, like lighting, camera characteristics and background. As a result, this paper also samples negative frames that are out of context range from the same video to avoid this problem.

【CV】ICCV2015_Learning Temporal Embeddings for Complex Video Analysis的更多相关文章

  1. 【CV】ICCV2015_Describing Videos by Exploiting Temporal Structure

    Describing Videos by Exploiting Temporal Structure Note here: it's a learning note on the topic of v ...

  2. 【转载】Hierarchal Temporal Memory (HTM)

    最近在看机器学习,看能否根据已有的历史来预测Hardware的故障发生概率.下文是一篇很有意思的文章,转自 http://numenta.org/htm.html. NuPIC是一个开源项目,用来实现 ...

  3. 【CV】ICCV2015_Unsupervised Learning of Spatiotemporally Coherent Metrics

    Unsupervised Learning of Spatiotemporally Coherent Metrics Note here: it's a learning note on the to ...

  4. 【DB2】SQL0437W Performance for this complex query may be sub-optimal

    参考链接 Technote (troubleshooting) Problem(Abstract) Error [IBM][CLI Driver][DB2/6000] SQL0437W Perform ...

  5. 【CV】CVPR2015_A Discriminative CNN Video Representation for Event Detection

    A Discriminative CNN Video Representation for Event Detection Note here: it's a learning note on the ...

  6. 【CV】ICCV2015_Unsupervised Visual Representation Learning by Context Prediction

    Unsupervised Visual Representation Learning by Context Prediction Note here: it's a learning note on ...

  7. 【CV】ICCV2015_Unsupervised Learning of Visual Representations using Videos

    Unsupervised Learning of Visual Representations using Videos Note here: it's a learning note on Prof ...

  8. 【题解】[USACO12JAN]视频游戏的连击Video Game Combos

    好久没有写博客了,好惭愧啊……虽然这是一道弱题但还是写一下吧. 这道题目的思路应该说是很容易形成:字符串+最大值?自然联想到学过的AC自动机与DP.对于给定的字符串建立出AC自动机,dp状态dp[i] ...

  9. 【ML】ICML2015_Unsupervised Learning of Video Representations using LSTMs

    Unsupervised Learning of Video Representations using LSTMs Note here: it's a learning notes on new L ...

随机推荐

  1. IntelliJ IDEA 2017 激活

    http://idea.lanyus.com/ 直接获取注册码 或者复制,在license server中添加: http://intellij.mandroid.cn/ http://idea.im ...

  2. 阿里八八β阶段Scrum(5/5)

    今日进度 陈裕鹏: 简单信息抽取编码完成 叶文滔: 处理了信息抽取编码的一些BUG,修复了日程界面不会自动更新添加的日程的BUG,修改了原先测试用的TAG以及数据分析部分数据计算数值错误的问题 王国超 ...

  3. 使用golang的slice来模拟栈

    slice(切片):底层数据结构是数组 stack(栈):一种先进后出的数据结构 普通版的模拟写入和读取的栈 package main import "fmt" //栈的特点是先进 ...

  4. python第五十一课——__slots

    2.__slots__: 作用:限制对象随意的动态添加属性 举例: class Demo: __slots__ = ('name','age','height','weight') #实例化Demo对 ...

  5. redis命令大全参考手册

    redis功能强大,支持数据类型丰富,以下是redis操作命令大全,基本上涵盖了redis所有的命令,并附有解释说明,大家可以收藏.参考,你一定要知道的是:redis的key名要区分大小写,在redi ...

  6. 20145236《网络攻防》Exp5 MSF基础应用

    20145236<网络攻防>Exp5 MSF基础应用 一.基础问题回答 解释exploit,payload,encode是什么: exploit就是负责负载有用代码的交通工具,先通过exp ...

  7. 内存大厂威刚发布速度高达550MB/s的固态硬盘SU750

    近日,内存大厂ADATA(威刚)宣布推出两款1款全新的固态硬盘——SU750. SU750是一款传统的2.5英寸SSD,采用了新一代TLC 3D NAND闪存,官方表示借助SLC缓存,其最高存取速度可 ...

  8. jQuery 动画效果

    推荐网址:http://www.php100.com/manual/jquery/,用法教学,包括实例. 分类:显示隐藏.淡入淡出.滑动.自定义. <%@ Page Language=" ...

  9. linux下模拟一个木马程序运行过程

    预备知识: 将一个程序放入到后台,悄悄的执行 ./xxx.sh & 进程: 用户进程:由用户来管理 系统进程:由系统内核自行管理 系统中的每个进程,都有一个位置的ID,这就是pid,而且每次启 ...

  10. java 迭代器遍历List Set Map

    Iterator接口: 所有实现了Collection接口的容器类都有一个iterator方法用以返回一个实现Iterator接口的对象 Iterator对象称作为迭代器,用以方便的对容器内元素的遍历 ...