【CV】ICCV2015_Learning Temporal Embeddings for Complex Video Analysis
Learning Temporal Embeddings for Complex Video Analysis
Note here: it's a review note on novel work from Feifei-Li's group about video representations, published on ICCV2015.
Motivation:
- Labeled video data is short for learning video representations, we need an unsupervised way.
- Context(temporal structure) is significant for video representations.
Proposed model:
- give one query frame, we can predict corresponding context representations(embeddings) of it through this model.
- Pipline:

\(f_{vj}(s_{vj};w_{e})\): embedding function
(\(W_{e}\) is the only parameter here we need to train for)
- Training:
\(h_{vj}=\frac{1}{2T}\sum_{t=1}^T(f_{vj+t}+f_{vj-t})\): context vector
Unsupervised learning objective (SVM Loss):
\(J(W_{e})=\sum_{v\in V}\sum_{S_{vj\in V},S\neq S_{vj}}max(0,1-(f_{vj}-f_{\_})\cdot h_{vj})\)
(\(f_{vj}\) is the embedding of frame \(S_{vj}\))
(\(f_{\_}\) is a negative frame which is not highly relevant to \(S_{vj}\))
(\(h_{vj}\) is the context embedding of frame \(S_{vj}\))
We’ll go further into the choosing of negative frames and context range later.
Intuition:
This model momorizes the context of specific frame. It utilizes the spatial appearance of the frame to form an embedding vector, which infers its context information.
Spatial feature learned from CNN \(\xrightarrow{\;\;\;W_{e}\;\;projection\;\;\;}\) Temporal feature embeds context
(\(W_{e}\) memorizes the temporal pattern during training)
With the temporal structure, even though some frames are not appearance similar, they can also be near in the feature space as long as they share similar context. Like following:

There’re two takeaways in the training process:
- Multi-resolution sampling: it’s hard to decide a generic context range(T), for videos own different paces, some may be quick while some are slow. This paper proposed a multi-resolution sampling strategy, instead of only sampling the context with same frame gap, it sampling with various gap lengths. That’s a trade-off between semantic relatedness and visual variaty.

- Hard Negative: choosing of negative samples are important for a robust model. It’s natural to come up with sampling negative frames in other videos and context frames from the same video, but this may cause the model overfit for some video-specific, less sementic properties, like lighting, camera characteristics and background. As a result, this paper also samples negative frames that are out of context range from the same video to avoid this problem.
【CV】ICCV2015_Learning Temporal Embeddings for Complex Video Analysis的更多相关文章
- 【CV】ICCV2015_Describing Videos by Exploiting Temporal Structure
Describing Videos by Exploiting Temporal Structure Note here: it's a learning note on the topic of v ...
- 【转载】Hierarchal Temporal Memory (HTM)
最近在看机器学习,看能否根据已有的历史来预测Hardware的故障发生概率.下文是一篇很有意思的文章,转自 http://numenta.org/htm.html. NuPIC是一个开源项目,用来实现 ...
- 【CV】ICCV2015_Unsupervised Learning of Spatiotemporally Coherent Metrics
Unsupervised Learning of Spatiotemporally Coherent Metrics Note here: it's a learning note on the to ...
- 【DB2】SQL0437W Performance for this complex query may be sub-optimal
参考链接 Technote (troubleshooting) Problem(Abstract) Error [IBM][CLI Driver][DB2/6000] SQL0437W Perform ...
- 【CV】CVPR2015_A Discriminative CNN Video Representation for Event Detection
A Discriminative CNN Video Representation for Event Detection Note here: it's a learning note on the ...
- 【CV】ICCV2015_Unsupervised Visual Representation Learning by Context Prediction
Unsupervised Visual Representation Learning by Context Prediction Note here: it's a learning note on ...
- 【CV】ICCV2015_Unsupervised Learning of Visual Representations using Videos
Unsupervised Learning of Visual Representations using Videos Note here: it's a learning note on Prof ...
- 【题解】[USACO12JAN]视频游戏的连击Video Game Combos
好久没有写博客了,好惭愧啊……虽然这是一道弱题但还是写一下吧. 这道题目的思路应该说是很容易形成:字符串+最大值?自然联想到学过的AC自动机与DP.对于给定的字符串建立出AC自动机,dp状态dp[i] ...
- 【ML】ICML2015_Unsupervised Learning of Video Representations using LSTMs
Unsupervised Learning of Video Representations using LSTMs Note here: it's a learning notes on new L ...
随机推荐
- mybatis 字段类型Data相
在项目中查询时间段的sql语句(时间类型为datetime或date)(数据库中的时间类型): <if test="beginTime!=null and beginTime!=''& ...
- YUM仓库服务与PXE网络装机
1.yum:基于RPM包构建软件更新机制自动解决依赖关系,软件包由软件包库提供 提供方式:ftp服务:ftp://IP地址/仓库目录 Http服务:http :// IP地址/仓库目录 本地目录:f ...
- Sqoop-1.4.7-部署与常见案例
该文章是基于 Hadoop2.7.6_01_部署 . Hive-1.2.1_01_安装部署 进行的 1. 前言 在一个完整的大数据处理系统中,除了hdfs+mapreduce+hive组成分析系统的核 ...
- Linux Java 环境配置及内置tomcat部署
tar zxvf jdk-8u101-linux-x64.tar.gz vi /etc/profile JAVA_HOME=/home/puma/jdk1.8.0_111JAVA_BIN=/home/ ...
- 添加Nginx为系统服务(设置开机启动)
在本节中,我们将创建一个脚本,将Nginx守护进程转换为实际的系统服务. 这有两个作用:守护程序可以使用标准命令控制,更重要的是,它可以在系统启动时自动启动,并在系统关闭时停止. System V s ...
- PostgreSQL 空间处理函数
PostGIS中的常用函数 以下内容包括比较多的尖括号,发布到blogger的时候会显示不正常,内容太多我也无暇一个个手动改代码,因此如有问题就去参考PostGIS官方文档. 首先需要说明一下,这里许 ...
- BSOJ 2414 -- 【JSOI2011】分特产
Description JYY 带队参加了若干场ACM/ICPC 比赛,带回了许多土特产,要分给实验室的同学们. JYY 想知道,把这些特产分给N 个同学,一共有多少种不同的分法?当然,JYY 不希望 ...
- springmvc 自定义拦截器实现未登录用户的拦截
第一步:编写自定义拦截器类,该类继承HandlerInterceptorAdapter,重写preHandle方法 第二步:配置springmvc.xml文件,定义拦截器属性 登录请求的mappi ...
- hass连接设备
hass如何自动发现mqtt设备 https://www.hachina.io/docs/7230.html 玩家自定义 https://www.hachina.io/6572.html
- shell编程之测试和判断
一.测试 程序运行中经常需要根据实际情况来运行特定的命令或代码段.比如判断某个文件或目录是否存在,如果文件或目录不存在,可能首先创建文件或目录.举例说,要判断文件/var/log/mlocate文件是 ...