【CV】ICCV2015_Unsupervised Learning of Visual Representations using Videos

Unsupervised Learning of Visual Representations using Videos

Note here: it's a learning note on Prof. Gupta's novel work published on ICCV2015. It's really exciting to know how unsupervised learning method can contribute to learn visual representations! Also, Feifei-Li's group published a paper on video representation using unsupervised method in ICCV2015 almost at the same time! I also wrote a review on it, check it here!

Link: http://arxiv.org/pdf/1505.00687v2.pdf

Motivation:

- Supervised learning is popular for CNN to train an excellent model on various visual problems, while the application of unsupervised learning leaves blank.

- People learn concepts quickly without numerous instances for training, and we learning things in a dynamic, mostly unsupervised environment.

- We’re short of labeled video data to do supervised learning, but we can easily access to tons of unlabeled data through Internet, which can be made use of by unsupervised learning.

Proposed Model:

Target: learning visual representations from videos in an unsupervised way

Key idea: tracking of moving object provides supervision

Brief introduction:

- Objective function (constraint): capture the first patch p1 of a moving object, keep tracking of it and get another patch p2 after several frames, then randomly select a negative patch p- from other places. The idea of objective function constrains the distance of p1 and p2 in feature space should be shorter than distance of p1 and p-

- Selection of tracking patch: using IDT to obtain SURF interest points to find out which part of the frame moves most. Setting threshold on the ratio of SURF interest points to avoid noise and camera motion.

- Tracking: using KCF tracker to track the patch

- Overrall pipline:

Feed triplet into three identical CNN, put two fully-connected layers on the top of pooling-5 layer to project into feature space, then computing the ranking loss to back-propagate the network. (note that: these three CNN shares parameters)

Training strategy:

There’re many empirical details to train a more powerful CNN in this work, however I’m not going to dive into it, only give some brief reviews on some the trick.

- Choose of negative samples:

- Random selection in the first 10 epochs of training

- Hard negative mining in later epochs, we search for all the possible negative patches and choose the top K patches which give maximum loss

* Intuition on the result:

See from the table above, [unsup + fp(3 ensemble)] outperforms other methods on the detection task of bus, car, person and train, but falls far behind on detecting bird, cat, dog and sofa, which may give us some intuitions.

【CV】ICCV2015_Unsupervised Learning of Visual Representations using Videos的更多相关文章

【CV】ICCV2015_Unsupervised Learning of Spatiotemporally Coherent Metrics
Unsupervised Learning of Spatiotemporally Coherent Metrics Note here: it's a learning note on the to ...
【ML】ICML2015_Unsupervised Learning of Video Representations using LSTMs
Unsupervised Learning of Video Representations using LSTMs Note here: it's a learning notes on new L ...
【CV】ICCV2015_Unsupervised Visual Representation Learning by Context Prediction
Unsupervised Visual Representation Learning by Context Prediction Note here: it's a learning note on ...
【翻译】我钟爱的Visual Studio前端开发工具/扩展
原文:[翻译]我钟爱的Visual Studio前端开发工具/扩展怎么样让Visual Studio更好地编写HTML5, CSS3, JavaScript, jQuery,换句话说就是如何更好地做 ...
论文解读（SimCLR）《A Simple Framework for Contrastive Learning of Visual Representations》
1 题目 <A Simple Framework for Contrastive Learning of Visual Representations> 作者: Ting Chen, Si ...
A Simple Framework for Contrastive Learning of Visual Representations
目录概主要内容流程 projection head g constractive loss augmentation other 代码 Chen T., Kornblith S., Norouz ...
ZH奶酪：【阅读笔记】Deep Learning, NLP, and Representations
中文译文:深度学习.自然语言处理和表征方法 http://blog.jobbole.com/77709/ 英文原文:Deep Learning, NLP, and Representations ht ...
【RS】CoupledCF: Learning Explicit and Implicit User-item Couplings in Recommendation for Deep Collaborative Filtering-CoupledCF：在推荐系统深度协作过滤中学习显式和隐式的用户物品耦合
[论文标题]CoupledCF: Learning Explicit and Implicit User-item Couplings in Recommendation for Deep Colla ...
【RS】List-wise learning to rank with matrix factorization for collaborative filtering - 结合列表启发排序和矩阵分解的协同过滤
[论文标题]List-wise learning to rank with matrix factorization for collaborative filtering (RecSys '10 ...

随机推荐

10LaTeX学习系列之---Latex的文档结构
目录目录前言 (一)对于Ctex宏包中的文档结构 1.说明 2.源代码 3.输出效果 4.技巧 (二)对于ctexart的文档结构 1.说明 2.源代码 3.输出效果 (三)对于ctexbook的 ...
Java设计模式之九 ----- 解释器模式和迭代器模式
前言在上一篇中我们学习了行为型模式的责任链模式(Chain of Responsibility Pattern)和命令模式(Command Pattern).本篇则来学习下行为型模式的两个模式, 解 ...
【BZOJ2159】Crash的文明世界
[2011集训贾志鹏]Crash的文明世界 Description Crash小朋友最近迷上了一款游戏--文明5(Civilization V).在这个游戏中,玩家可以建立和发展自己的国家,通过外交和 ...
swift的调用约定
swift的静态绑定 Swift Calling Convention @convention(swift) func foo(_ x:Int, y:Int) sil @foo : $(x:Int, ...
用PHP的curl实现并发请求远程文件（并发抓取远程网页）
PHP的curl功能确实强大了.里面有个curl_multi_init功能,就是批量处理任务.可以利用此,实现多进程同步抓取多条记录,优化普通的网页抓取程序. 一个简单的抓取函数: function ...
Lock和Condition在JDK中ArrayBlockingQueue的应用
ArrayBlockingQueue的实现思路简单描述,ArrayBlockingQueue的底对于互斥访问使用的一个锁.细节参考源码take和put方法: import java.util.conc ...
JS进阶之---作用域，作用域链，闭包
一.作用域: 在JavaScript中,我们可以将作用域定义为一套规则,这套规则用来管理引擎如何在当前作用域以及嵌套的子作用域中根据标识符名称进行变量查找.这里的标识符,指的是变量名或者函数名. Ja ...
mysql 创建表格 AUTO_INCREMENT
CREATE TABLE `t_user` ( `USER_ID` int(11) NOT NULL AUTO_INCREMENT, `USER_NAME` char(30) NOT NULL, `U ...
mysql索引类型-方法-形式-使用时机-不足之处--注意事项
一.索引的类型 1.普通索引增加 create index index_name on table(colume(length)); 例子:cre ...
leetcode538. Convert BST to Greater Tree
https://www.cnblogs.com/grandyang/p/6591526.html 这个题本质上是中序遍历的反向.中序遍历是从小到大,而这个题目是从大到小,然后每个数加上比自己大的所有数 ...

【CV】ICCV2015_Unsupervised Learning of Visual Representations using Videos

【CV】ICCV2015_Unsupervised Learning of Visual Representations using Videos的更多相关文章

随机推荐

热门专题