Learning Temporal Embeddings for Complex Video Analysis

Note here: it's a review note on novel work from Feifei-Li's group about video representations, published on ICCV2015.

Link: http://www.cv-foundation.org/openaccess/content_iccv_2015/html/Ramanathan_Learning_Temporal_Embeddings_ICCV_2015_paper.html

Motivation:

- Labeled video data is short for learning video representations, we need an unsupervised way.

- Context(temporal structure) is significant for video representations.

Proposed model:

- give one query frame, we can predict corresponding context representations(embeddings) of it through this model.

- Pipline:

\(f_{vj}(s_{vj};w_{e})\): embedding function

(\(W_{e}\) is the only parameter here we need to train for)

- Training:

\(h_{vj}=\frac{1}{2T}\sum_{t=1}^T(f_{vj+t}+f_{vj-t})\): context vector

Unsupervised learning objective (SVM Loss):

\(J(W_{e})=\sum_{v\in V}\sum_{S_{vj\in V},S\neq S_{vj}}max(0,1-(f_{vj}-f_{\_})\cdot h_{vj})\)

(\(f_{vj}\) is the embedding of frame \(S_{vj}\))

(\(f_{\_}\) is a negative frame which is not highly relevant to \(S_{vj}\))

(\(h_{vj}\) is the context embedding of frame \(S_{vj}\))

We’ll go further into the choosing of negative frames and context range later.

Intuition:

This model momorizes the context of specific frame. It utilizes the spatial appearance of the frame to form an embedding vector, which infers its context information.

Spatial feature learned from CNN \(\xrightarrow{\;\;\;W_{e}\;\;projection\;\;\;}\) Temporal feature embeds context

(\(W_{e}\) memorizes the temporal pattern during training)

With the temporal structure, even though some frames are not appearance similar, they can also be near in the feature space as long as they share similar context. Like following:

There’re two takeaways in the training process:

- Multi-resolution sampling: it’s hard to decide a generic context range(T), for videos own different paces, some may be quick while some are slow. This paper proposed a multi-resolution sampling strategy, instead of only sampling the context with same frame gap, it sampling with various gap lengths. That’s a trade-off between semantic relatedness and visual variaty.

- Hard Negative: choosing of negative samples are important for a robust model. It’s natural to come up with sampling negative frames in other videos and context frames from the same video, but this may cause the model overfit for some video-specific, less sementic properties, like lighting, camera characteristics and background. As a result, this paper also samples negative frames that are out of context range from the same video to avoid this problem.

【CV】ICCV2015_Learning Temporal Embeddings for Complex Video Analysis的更多相关文章

  1. 【CV】ICCV2015_Describing Videos by Exploiting Temporal Structure

    Describing Videos by Exploiting Temporal Structure Note here: it's a learning note on the topic of v ...

  2. 【转载】Hierarchal Temporal Memory (HTM)

    最近在看机器学习,看能否根据已有的历史来预测Hardware的故障发生概率.下文是一篇很有意思的文章,转自 http://numenta.org/htm.html. NuPIC是一个开源项目,用来实现 ...

  3. 【CV】ICCV2015_Unsupervised Learning of Spatiotemporally Coherent Metrics

    Unsupervised Learning of Spatiotemporally Coherent Metrics Note here: it's a learning note on the to ...

  4. 【DB2】SQL0437W Performance for this complex query may be sub-optimal

    参考链接 Technote (troubleshooting) Problem(Abstract) Error [IBM][CLI Driver][DB2/6000] SQL0437W Perform ...

  5. 【CV】CVPR2015_A Discriminative CNN Video Representation for Event Detection

    A Discriminative CNN Video Representation for Event Detection Note here: it's a learning note on the ...

  6. 【CV】ICCV2015_Unsupervised Visual Representation Learning by Context Prediction

    Unsupervised Visual Representation Learning by Context Prediction Note here: it's a learning note on ...

  7. 【CV】ICCV2015_Unsupervised Learning of Visual Representations using Videos

    Unsupervised Learning of Visual Representations using Videos Note here: it's a learning note on Prof ...

  8. 【题解】[USACO12JAN]视频游戏的连击Video Game Combos

    好久没有写博客了,好惭愧啊……虽然这是一道弱题但还是写一下吧. 这道题目的思路应该说是很容易形成:字符串+最大值?自然联想到学过的AC自动机与DP.对于给定的字符串建立出AC自动机,dp状态dp[i] ...

  9. 【ML】ICML2015_Unsupervised Learning of Video Representations using LSTMs

    Unsupervised Learning of Video Representations using LSTMs Note here: it's a learning notes on new L ...

随机推荐

  1. Windows Server 2008 R2终端服务器激活方法

    本文描述了如何激活Windows Server 2008 R2的终端服务器的方法. 目录: 1.Windows Server  2008 R2终端服务器的安装 2.Windows Server  20 ...

  2. 【PAT】B1069 微博转发抽奖(20 分)

    一开始并没有做出来,关键是没有认真理解题,每次做题之前都应该认真读题,自己把样例模拟一下,防止漏掉信息,减慢自己写代码的速度 此题的重点在于规划逻辑,以及如何储存中奖者,用map最好,否则查找并不方便 ...

  3. centos7系统下hostname解析

    hostnamectl 是在 centos7以上版本 中新增加的命令,它是用来修改主机名称的,centos7 修改主机名称会比以往容易许多. 首先了解下这个命令 # hostnamectl -h -h ...

  4. orcale 使用创建日期排序然后分页每次取排序后的固定条数

    需求: 一个使用mybatis分页插件的分页列表, 现在要求新增一条数据或者修改一条数据后,显示在最前端 思路: 使用sql , 先将查询出来的数据排序, 然后使用rownum > page*s ...

  5. 【16】有关python面向对象编程

    面向对象编程 一.第一个案例---创建类 #__author:"吉" #date: 2018/10/27 0027 #function: # 设计类: ''' 1 类名:首字母大写 ...

  6. SWFUpload多文件上传使用指南

    SWFUpload是一个flash和js相结合而成的文件上传插件,其功能非常强大.以前在项目中用过几次,但它的配置参数太多了,用过后就忘记怎么用了,到以后要用时又得到官网上看它的文档,真是太烦了.所以 ...

  7. django 中的 ajax

    (Asynchronous Javascript And XML ) 特点: 异步 页面局部刷新 传递的数据量小 ajax 请求返回数据 重定向 location.href='/index/' 发请求 ...

  8. ethereum/EIPs-607 Hardfork Meta: Spurious Dragon硬分叉相关

    eip title author type status created requires 607 Hardfork Meta: Spurious Dragon Alex Beregszaszi Me ...

  9. npm和node的版本过低时的解决办法

    npm版本过低时的解决办法npm全名Node Package Manager 1.配置源的三种方法:1).npmrc文件的作用,就是配置npm源:使用淘宝源的方法就是在文件.npmrc中加入下面的语句 ...

  10. Linux系统--命令行安装weblogic10.3.6

    Linux下命令行安装weblogic10.3.6 一.安装前准备工作: 1.创建用户useradd weblogic;创建用户成功linux系统会自动创建一个和用户名相同的分组,并将该用户分到改组中 ...