【CV】ICCV2015_Learning Temporal Embeddings for Complex Video Analysis
Learning Temporal Embeddings for Complex Video Analysis
Note here: it's a review note on novel work from Feifei-Li's group about video representations, published on ICCV2015.
Motivation:
- Labeled video data is short for learning video representations, we need an unsupervised way.
- Context(temporal structure) is significant for video representations.
Proposed model:
- give one query frame, we can predict corresponding context representations(embeddings) of it through this model.
- Pipline:
\(f_{vj}(s_{vj};w_{e})\): embedding function
(\(W_{e}\) is the only parameter here we need to train for)
- Training:
\(h_{vj}=\frac{1}{2T}\sum_{t=1}^T(f_{vj+t}+f_{vj-t})\): context vector
Unsupervised learning objective (SVM Loss):
\(J(W_{e})=\sum_{v\in V}\sum_{S_{vj\in V},S\neq S_{vj}}max(0,1-(f_{vj}-f_{\_})\cdot h_{vj})\)
(\(f_{vj}\) is the embedding of frame \(S_{vj}\))
(\(f_{\_}\) is a negative frame which is not highly relevant to \(S_{vj}\))
(\(h_{vj}\) is the context embedding of frame \(S_{vj}\))
We’ll go further into the choosing of negative frames and context range later.
Intuition:
This model momorizes the context of specific frame. It utilizes the spatial appearance of the frame to form an embedding vector, which infers its context information.
Spatial feature learned from CNN \(\xrightarrow{\;\;\;W_{e}\;\;projection\;\;\;}\) Temporal feature embeds context
(\(W_{e}\) memorizes the temporal pattern during training)
With the temporal structure, even though some frames are not appearance similar, they can also be near in the feature space as long as they share similar context. Like following:
There’re two takeaways in the training process:
- Multi-resolution sampling: it’s hard to decide a generic context range(T), for videos own different paces, some may be quick while some are slow. This paper proposed a multi-resolution sampling strategy, instead of only sampling the context with same frame gap, it sampling with various gap lengths. That’s a trade-off between semantic relatedness and visual variaty.
- Hard Negative: choosing of negative samples are important for a robust model. It’s natural to come up with sampling negative frames in other videos and context frames from the same video, but this may cause the model overfit for some video-specific, less sementic properties, like lighting, camera characteristics and background. As a result, this paper also samples negative frames that are out of context range from the same video to avoid this problem.
【CV】ICCV2015_Learning Temporal Embeddings for Complex Video Analysis的更多相关文章
- 【CV】ICCV2015_Describing Videos by Exploiting Temporal Structure
Describing Videos by Exploiting Temporal Structure Note here: it's a learning note on the topic of v ...
- 【转载】Hierarchal Temporal Memory (HTM)
最近在看机器学习,看能否根据已有的历史来预测Hardware的故障发生概率.下文是一篇很有意思的文章,转自 http://numenta.org/htm.html. NuPIC是一个开源项目,用来实现 ...
- 【CV】ICCV2015_Unsupervised Learning of Spatiotemporally Coherent Metrics
Unsupervised Learning of Spatiotemporally Coherent Metrics Note here: it's a learning note on the to ...
- 【DB2】SQL0437W Performance for this complex query may be sub-optimal
参考链接 Technote (troubleshooting) Problem(Abstract) Error [IBM][CLI Driver][DB2/6000] SQL0437W Perform ...
- 【CV】CVPR2015_A Discriminative CNN Video Representation for Event Detection
A Discriminative CNN Video Representation for Event Detection Note here: it's a learning note on the ...
- 【CV】ICCV2015_Unsupervised Visual Representation Learning by Context Prediction
Unsupervised Visual Representation Learning by Context Prediction Note here: it's a learning note on ...
- 【CV】ICCV2015_Unsupervised Learning of Visual Representations using Videos
Unsupervised Learning of Visual Representations using Videos Note here: it's a learning note on Prof ...
- 【题解】[USACO12JAN]视频游戏的连击Video Game Combos
好久没有写博客了,好惭愧啊……虽然这是一道弱题但还是写一下吧. 这道题目的思路应该说是很容易形成:字符串+最大值?自然联想到学过的AC自动机与DP.对于给定的字符串建立出AC自动机,dp状态dp[i] ...
- 【ML】ICML2015_Unsupervised Learning of Video Representations using LSTMs
Unsupervised Learning of Video Representations using LSTMs Note here: it's a learning notes on new L ...
随机推荐
- 17秋 软件工程 Alpha 事后诸葛亮会议
题目: 团队作业--Alpha冲刺 17秋 软件工程 Alpha 事后诸葛亮会议 关于评价与建议的反馈 评价1:管理部门我觉得对我已经用处不大了不过对新生用处很大.像学长说的一样,里面不是流程很懂但是 ...
- BSOJ 3899 -- 【CQOI2014】 数三角形
Description 给定一个n*m的网格,请计算三个点都在格点上的三角形共有多少个.下图为4*4的网格上的一个三角形. 注意三角形的三点不能共线. Input 输入一行,包含两个空格分隔的正整数 ...
- Django之Template
模板层(template) 概念: 模板与html的区别: 模板=html+模板语法 模板语法: 1 变量: {{}} 深度查询: 通过句点符. 列表,字典 clas ...
- Excel函数(不定期持续更新)
1.COUNTIF函数 COUNTIF函数用来计算单元格区域内符合条件的单元格个数. COUNTIF函数只有两个参数 COUNTIF(单元格区域,计算的条件) 例如:计算上海市的数量
- swift的类型描述符
Metatype Types A concrete or existential metatype in SIL must describe its representation. This can ...
- nodejs中async使用
waterfall , parallel , series , eachSeries //var async = require('async'); /*** *① * 串行有关联 执行每个函数 ...
- (2)free详解 (每周一个linux命令系列)
(2)free详解 (每周一个linux命令系列) linux命令 free详解 引言:今天的命令是用来看内存的free free 换一个套路,我们先看man free中对free的描述: Displ ...
- go标准库的学习-mime/multipart
参考:https://studygolang.com/pkgdoc 导入方式: import "mime/multipart" multipart实现了MIME的multipart ...
- docker被入侵后.............
服务器上线后,怎么发现总有个 xmrig 的容器在跑,删了还出来 那么恭喜你!!你的服务器已经被入侵了!! $ docker ps IMAGE COMMAND ...
- 【Topcoder 10384】KingdomMap
Topcoder 10384 题意:给你一个森林,求是否能将这个森林的点集分成两部分,每部分放在一列中,要求边是直的并且不能交叉,问最少删哪几条边. 思路:我们考虑森林中的一棵树,以\(u\)为根,将 ...