【CV】ICCV2015_Learning Temporal Embeddings for Complex Video Analysis
Learning Temporal Embeddings for Complex Video Analysis
Note here: it's a review note on novel work from Feifei-Li's group about video representations, published on ICCV2015.
Motivation:
- Labeled video data is short for learning video representations, we need an unsupervised way.
- Context(temporal structure) is significant for video representations.
Proposed model:
- give one query frame, we can predict corresponding context representations(embeddings) of it through this model.
- Pipline:

\(f_{vj}(s_{vj};w_{e})\): embedding function
(\(W_{e}\) is the only parameter here we need to train for)
- Training:
\(h_{vj}=\frac{1}{2T}\sum_{t=1}^T(f_{vj+t}+f_{vj-t})\): context vector
Unsupervised learning objective (SVM Loss):
\(J(W_{e})=\sum_{v\in V}\sum_{S_{vj\in V},S\neq S_{vj}}max(0,1-(f_{vj}-f_{\_})\cdot h_{vj})\)
(\(f_{vj}\) is the embedding of frame \(S_{vj}\))
(\(f_{\_}\) is a negative frame which is not highly relevant to \(S_{vj}\))
(\(h_{vj}\) is the context embedding of frame \(S_{vj}\))
We’ll go further into the choosing of negative frames and context range later.
Intuition:
This model momorizes the context of specific frame. It utilizes the spatial appearance of the frame to form an embedding vector, which infers its context information.
Spatial feature learned from CNN \(\xrightarrow{\;\;\;W_{e}\;\;projection\;\;\;}\) Temporal feature embeds context
(\(W_{e}\) memorizes the temporal pattern during training)
With the temporal structure, even though some frames are not appearance similar, they can also be near in the feature space as long as they share similar context. Like following:

There’re two takeaways in the training process:
- Multi-resolution sampling: it’s hard to decide a generic context range(T), for videos own different paces, some may be quick while some are slow. This paper proposed a multi-resolution sampling strategy, instead of only sampling the context with same frame gap, it sampling with various gap lengths. That’s a trade-off between semantic relatedness and visual variaty.

- Hard Negative: choosing of negative samples are important for a robust model. It’s natural to come up with sampling negative frames in other videos and context frames from the same video, but this may cause the model overfit for some video-specific, less sementic properties, like lighting, camera characteristics and background. As a result, this paper also samples negative frames that are out of context range from the same video to avoid this problem.
【CV】ICCV2015_Learning Temporal Embeddings for Complex Video Analysis的更多相关文章
- 【CV】ICCV2015_Describing Videos by Exploiting Temporal Structure
Describing Videos by Exploiting Temporal Structure Note here: it's a learning note on the topic of v ...
- 【转载】Hierarchal Temporal Memory (HTM)
最近在看机器学习,看能否根据已有的历史来预测Hardware的故障发生概率.下文是一篇很有意思的文章,转自 http://numenta.org/htm.html. NuPIC是一个开源项目,用来实现 ...
- 【CV】ICCV2015_Unsupervised Learning of Spatiotemporally Coherent Metrics
Unsupervised Learning of Spatiotemporally Coherent Metrics Note here: it's a learning note on the to ...
- 【DB2】SQL0437W Performance for this complex query may be sub-optimal
参考链接 Technote (troubleshooting) Problem(Abstract) Error [IBM][CLI Driver][DB2/6000] SQL0437W Perform ...
- 【CV】CVPR2015_A Discriminative CNN Video Representation for Event Detection
A Discriminative CNN Video Representation for Event Detection Note here: it's a learning note on the ...
- 【CV】ICCV2015_Unsupervised Visual Representation Learning by Context Prediction
Unsupervised Visual Representation Learning by Context Prediction Note here: it's a learning note on ...
- 【CV】ICCV2015_Unsupervised Learning of Visual Representations using Videos
Unsupervised Learning of Visual Representations using Videos Note here: it's a learning note on Prof ...
- 【题解】[USACO12JAN]视频游戏的连击Video Game Combos
好久没有写博客了,好惭愧啊……虽然这是一道弱题但还是写一下吧. 这道题目的思路应该说是很容易形成:字符串+最大值?自然联想到学过的AC自动机与DP.对于给定的字符串建立出AC自动机,dp状态dp[i] ...
- 【ML】ICML2015_Unsupervised Learning of Video Representations using LSTMs
Unsupervised Learning of Video Representations using LSTMs Note here: it's a learning notes on new L ...
随机推荐
- 【PAT】B1056 组合数的和(15 分)
就看着代码量一直到没什么好说的了 #include<stdio.h> int main(){ int N,K;scanf("%d",&N); int sum=0 ...
- 2.2Python数据处理篇之---math模块的数学函数
目录 目录 前言 (一)一览表 1.基本函数 2.对数函数 3.三角函数 4.角度的切换 5.双曲函数 6.math定义的常数 (二)实例 目录 前言 math模块是基础的python数学函数模块,是 ...
- python--继承关系
如果子类中定义与父类同名的方法或属性,则自动会覆盖父类对应的方法或属性. 子类完全继承父类的实例 >>> class Parent: def setName(self): print ...
- Linux结构目录
linux结构目录 Linux中有一句话叫做:一切皆文件. 下面来了解一下这些文件. 首先看一下Linux根目录下结构: bin:存放二进制可执行文件,一般常用命令都存放在这里. boot:存放系统启 ...
- 一个非常好的php实现手机号归属地查询接口类
前一阵子看到了一个非常好的php手机归属地查询的类,写的很精简,查询也很精确!大致代码是这样的: <?php header("Content-type:text/html;charse ...
- 一图尽知XMIND
- [转]VC++宏与预处理使用方法总结
原文链接:VC 宏与预处理使用方法总结 原文链接:VC预处理指令与宏定义的妙用
- centos6安装tomcat8.5
//参考https://www.cnblogs.com/xdp-gacl/p/4097608.html [root@192 ~]# mount /dev/sr0 /mnt/usb1[root@192 ...
- 【Codeforces 1132D】Stressful Training
Codeforces 1132 D 题意:给\(n\)个电脑的电量和耗电速度,你可以买一个充电器,它的充电速度是每秒\(v\)单位,\(v\)你自己定.问最小的\(v\)能使得在\(k\)秒内每秒给某 ...
- android ImageLoader加载本地图片的工具类
import android.widget.ImageView; import com.nostra13.universalimageloader.core.ImageLoader; /** * 异步 ...