【CV】ICCV2015_Unsupervised Learning of Visual Representations using Videos

Unsupervised Learning of Visual Representations using Videos

Note here: it's a learning note on Prof. Gupta's novel work published on ICCV2015. It's really exciting to know how unsupervised learning method can contribute to learn visual representations! Also, Feifei-Li's group published a paper on video representation using unsupervised method in ICCV2015 almost at the same time! I also wrote a review on it, check it here!

Link: http://arxiv.org/pdf/1505.00687v2.pdf

Motivation:

- Supervised learning is popular for CNN to train an excellent model on various visual problems, while the application of unsupervised learning leaves blank.

- People learn concepts quickly without numerous instances for training, and we learning things in a dynamic, mostly unsupervised environment.

- We’re short of labeled video data to do supervised learning, but we can easily access to tons of unlabeled data through Internet, which can be made use of by unsupervised learning.

Proposed Model:

Target: learning visual representations from videos in an unsupervised way

Key idea: tracking of moving object provides supervision

Brief introduction:

- Objective function (constraint): capture the first patch p1 of a moving object, keep tracking of it and get another patch p2 after several frames, then randomly select a negative patch p- from other places. The idea of objective function constrains the distance of p1 and p2 in feature space should be shorter than distance of p1 and p-

- Selection of tracking patch: using IDT to obtain SURF interest points to find out which part of the frame moves most. Setting threshold on the ratio of SURF interest points to avoid noise and camera motion.

- Tracking: using KCF tracker to track the patch

- Overrall pipline:

Feed triplet into three identical CNN, put two fully-connected layers on the top of pooling-5 layer to project into feature space, then computing the ranking loss to back-propagate the network. (note that: these three CNN shares parameters)

Training strategy:

There’re many empirical details to train a more powerful CNN in this work, however I’m not going to dive into it, only give some brief reviews on some the trick.

- Choose of negative samples:

- Random selection in the first 10 epochs of training

- Hard negative mining in later epochs, we search for all the possible negative patches and choose the top K patches which give maximum loss

* Intuition on the result:

See from the table above, [unsup + fp(3 ensemble)] outperforms other methods on the detection task of bus, car, person and train, but falls far behind on detecting bird, cat, dog and sofa, which may give us some intuitions.

【CV】ICCV2015_Unsupervised Learning of Visual Representations using Videos的更多相关文章

【CV】ICCV2015_Unsupervised Learning of Spatiotemporally Coherent Metrics
Unsupervised Learning of Spatiotemporally Coherent Metrics Note here: it's a learning note on the to ...
【ML】ICML2015_Unsupervised Learning of Video Representations using LSTMs
Unsupervised Learning of Video Representations using LSTMs Note here: it's a learning notes on new L ...
【CV】ICCV2015_Unsupervised Visual Representation Learning by Context Prediction
Unsupervised Visual Representation Learning by Context Prediction Note here: it's a learning note on ...
【翻译】我钟爱的Visual Studio前端开发工具/扩展
原文:[翻译]我钟爱的Visual Studio前端开发工具/扩展怎么样让Visual Studio更好地编写HTML5, CSS3, JavaScript, jQuery,换句话说就是如何更好地做 ...
论文解读（SimCLR）《A Simple Framework for Contrastive Learning of Visual Representations》
1 题目 <A Simple Framework for Contrastive Learning of Visual Representations> 作者: Ting Chen, Si ...
A Simple Framework for Contrastive Learning of Visual Representations
目录概主要内容流程 projection head g constractive loss augmentation other 代码 Chen T., Kornblith S., Norouz ...
ZH奶酪：【阅读笔记】Deep Learning, NLP, and Representations
中文译文:深度学习.自然语言处理和表征方法 http://blog.jobbole.com/77709/ 英文原文:Deep Learning, NLP, and Representations ht ...
【RS】CoupledCF: Learning Explicit and Implicit User-item Couplings in Recommendation for Deep Collaborative Filtering-CoupledCF：在推荐系统深度协作过滤中学习显式和隐式的用户物品耦合
[论文标题]CoupledCF: Learning Explicit and Implicit User-item Couplings in Recommendation for Deep Colla ...
【RS】List-wise learning to rank with matrix factorization for collaborative filtering - 结合列表启发排序和矩阵分解的协同过滤
[论文标题]List-wise learning to rank with matrix factorization for collaborative filtering (RecSys '10 ...

随机推荐

【PAT】B1041 考试座位号（15 分）
/* */ #include<stdio.h> #include<algorithm> using namespace std; struct stu{ char number ...
Java 调用Restful API接口的几种方式--HTTPS
摘要:最近有一个需求,为客户提供一些Restful API 接口,QA使用postman进行测试,但是postman的测试接口与java调用的相似但并不相同,于是想自己写一个程序去测试Restful ...
转://点评Oracle11g新特性之动态变量窥视
1. 11g之前的绑定变量窥视我们都知道,为了可以让SQL语句共享运行计划,oracle始终都是强调在进行应用系统的设计时,必须使用绑定变量,也就是用一个变量来取代原来出如今SQL语句里的字面值.比 ...
转://三分钟读懂Oracle数据库容灾架之DataGuard
目前市场上针对Oracle数据库常见的容灾产品大致可以分为两大类. Oracle 公司自己的容灾产品非Oracle公司的容灾产品 Oracle公司目前的容灾产品有我们常见的DataGuard和属于中 ...
Discrete Logging ZOJ - 1898 (模板题大小步算法)
就是求Ax三B(mod C)当C为素数时 #include<cstdio> #include<cstring> #include<cmath> #include&l ...
Matlab 学习视频集合
http://www.ilovematlab.cn/thread-22239-1-1.html
时间同步ctss与ntp的关系【CTSSD Runs in Observer Mode Even Though No Time Sync Software is Running (Doc ID 1054006.1) 】
CTSSD Runs in Observer Mode Even Though No Time Sync Software is Running (Doc ID 1054006.1) In this ...
VsCode插件开发之插件初步通信
参考了Egret Wing,想像Egret Wing那样在上方titlebar最右边上面增加一个menu(这个menu相对于一个按钮,当点击这个按钮时会出现一个window弹框,这个window弹框里 ...
PAT A1103 Integer Factorization （30 分）——dfs，递归
The K−P factorization of a positive integer N is to write N as the sum of the P-th power of K positi ...
Django ORM相关
1. ORM 外键关联查询和多对多关系正反向查询 Class Classes(): name = CF class Student(): name = CF class = FK(to="C ...

【CV】ICCV2015_Unsupervised Learning of Visual Representations using Videos

【CV】ICCV2015_Unsupervised Learning of Visual Representations using Videos的更多相关文章

随机推荐

热门专题