【CV】ICCV2015_Learning Temporal Embeddings for Complex Video Analysis
Learning Temporal Embeddings for Complex Video Analysis
Note here: it's a review note on novel work from Feifei-Li's group about video representations, published on ICCV2015.
Motivation:
- Labeled video data is short for learning video representations, we need an unsupervised way.
- Context(temporal structure) is significant for video representations.
Proposed model:
- give one query frame, we can predict corresponding context representations(embeddings) of it through this model.
- Pipline:

\(f_{vj}(s_{vj};w_{e})\): embedding function
(\(W_{e}\) is the only parameter here we need to train for)
- Training:
\(h_{vj}=\frac{1}{2T}\sum_{t=1}^T(f_{vj+t}+f_{vj-t})\): context vector
Unsupervised learning objective (SVM Loss):
\(J(W_{e})=\sum_{v\in V}\sum_{S_{vj\in V},S\neq S_{vj}}max(0,1-(f_{vj}-f_{\_})\cdot h_{vj})\)
(\(f_{vj}\) is the embedding of frame \(S_{vj}\))
(\(f_{\_}\) is a negative frame which is not highly relevant to \(S_{vj}\))
(\(h_{vj}\) is the context embedding of frame \(S_{vj}\))
We’ll go further into the choosing of negative frames and context range later.
Intuition:
This model momorizes the context of specific frame. It utilizes the spatial appearance of the frame to form an embedding vector, which infers its context information.
Spatial feature learned from CNN \(\xrightarrow{\;\;\;W_{e}\;\;projection\;\;\;}\) Temporal feature embeds context
(\(W_{e}\) memorizes the temporal pattern during training)
With the temporal structure, even though some frames are not appearance similar, they can also be near in the feature space as long as they share similar context. Like following:

There’re two takeaways in the training process:
- Multi-resolution sampling: it’s hard to decide a generic context range(T), for videos own different paces, some may be quick while some are slow. This paper proposed a multi-resolution sampling strategy, instead of only sampling the context with same frame gap, it sampling with various gap lengths. That’s a trade-off between semantic relatedness and visual variaty.

- Hard Negative: choosing of negative samples are important for a robust model. It’s natural to come up with sampling negative frames in other videos and context frames from the same video, but this may cause the model overfit for some video-specific, less sementic properties, like lighting, camera characteristics and background. As a result, this paper also samples negative frames that are out of context range from the same video to avoid this problem.
【CV】ICCV2015_Learning Temporal Embeddings for Complex Video Analysis的更多相关文章
- 【CV】ICCV2015_Describing Videos by Exploiting Temporal Structure
Describing Videos by Exploiting Temporal Structure Note here: it's a learning note on the topic of v ...
- 【转载】Hierarchal Temporal Memory (HTM)
最近在看机器学习,看能否根据已有的历史来预测Hardware的故障发生概率.下文是一篇很有意思的文章,转自 http://numenta.org/htm.html. NuPIC是一个开源项目,用来实现 ...
- 【CV】ICCV2015_Unsupervised Learning of Spatiotemporally Coherent Metrics
Unsupervised Learning of Spatiotemporally Coherent Metrics Note here: it's a learning note on the to ...
- 【DB2】SQL0437W Performance for this complex query may be sub-optimal
参考链接 Technote (troubleshooting) Problem(Abstract) Error [IBM][CLI Driver][DB2/6000] SQL0437W Perform ...
- 【CV】CVPR2015_A Discriminative CNN Video Representation for Event Detection
A Discriminative CNN Video Representation for Event Detection Note here: it's a learning note on the ...
- 【CV】ICCV2015_Unsupervised Visual Representation Learning by Context Prediction
Unsupervised Visual Representation Learning by Context Prediction Note here: it's a learning note on ...
- 【CV】ICCV2015_Unsupervised Learning of Visual Representations using Videos
Unsupervised Learning of Visual Representations using Videos Note here: it's a learning note on Prof ...
- 【题解】[USACO12JAN]视频游戏的连击Video Game Combos
好久没有写博客了,好惭愧啊……虽然这是一道弱题但还是写一下吧. 这道题目的思路应该说是很容易形成:字符串+最大值?自然联想到学过的AC自动机与DP.对于给定的字符串建立出AC自动机,dp状态dp[i] ...
- 【ML】ICML2015_Unsupervised Learning of Video Representations using LSTMs
Unsupervised Learning of Video Representations using LSTMs Note here: it's a learning notes on new L ...
随机推荐
- 如何以SYSTEM用户运行CMD
有的时候有些文件在管理员账户不能删除,这个时候需要在SYSTEM用户下删除. 可以通过以SYSTEM权限运行CMD来删除某些文件或目录的目的. 1. 从微软网站下载PSTool. 2. 以管理员运行C ...
- vue学习笔记1-基本知识
1.npm 安装node.js的时候会一起安装npm包管理器,能够解决nodejs代码部署问题,常见使用如下: 允许用户从npm服务器下载别人编写的第三方包到本地应用允许用户从npm服务器下载并安装别 ...
- SQLite基本操作-----IOS(如有雷同,纯属巧合)
一.常用方法 sqlite3 *db, 数据库句柄,跟文件句柄FILE很类似 sqlite3_stmt *stmt, 这个相当于ODBC的Command对象,用于保存编译好 ...
- 17秋 软件工程 Alpha展示博客
成员简介 姓名 个人简介 博客地址 郑世强 郑世强,计算机三班,了解java web端和Android端编程,使用过Spring MVC和Spring Boot开发商业程序,Android端学习了rx ...
- linux之基本命令进阶
一 配置yum源管理与软件管理 yum常见工具 tree telent sl cowsay yum install tree #安装tree命令,以树形目录显示 #由于每次安装都有确认的提示,取 ...
- 深入理解Intent和IntentFiler(一)
http://blog.csdn.net/u012637501/article/details/41080891 为了比较深刻的理解并灵活使用Intent,我计划将这部分的学习分为两步:一是深入理解I ...
- 2018-2019-2 网络对抗技术 20165318 Exp1 PC平台逆向破解
实验模块 (一)直接修改程序机器指令,改变程序执行流程: (二)通过构造输入参数,造成BOF攻击,改变程序执行流: (三)注入Shellcode并执行: 实验准备 设置共享文件夹(这一步我已经在之前安 ...
- 2017-2018-2 20155314《网络对抗技术》Exp9 Web安全基础
2017-2018-2 20155314<网络对抗技术>Exp9 Web安全基础 目录 实验目标 实验内容 实验环境 基础问题回答 预备知识 实验步骤--WebGoat实践 0x10 We ...
- 一图尽知XMIND
- Python虚拟环境和包管理工具Pipenv的使用详解--看完这一篇就够了
前言 Python虚拟环境是一个虚拟化,从电脑独立开辟出来的环境.在这个虚拟环境中,我们可以pip安装各个项目不同的依赖包,从全局中隔离出来,利于管理. 传统的Python虚拟环境有virtualen ...