【ML】ICML2015_Unsupervised Learning of Video Representations using LSTMs
Unsupervised Learning of Video Representations using LSTMs
Note here: it's a learning notes on new LSTMs architecture used as an unsupervised learning way of video representations.
(More unsupervised learning related topics, you can refer to:
Learning Temporal Embeddings for Complex Video Analysis
Unsupervised Learning of Visual Representations using Videos
Unsupervised Visual Representation Learning by Context Prediction)
Link: http://arxiv.org/abs/1502.04681
Motivation:
- Understanding temporal sequences is important for solving many video related problems. We should utilize temporal structure of videos as a supervisory signal for unsupervised learning.
Proposed model:
In this paper, the author proposed three models based on LSTM:
1) LSTM Autoencoder Model:

This model is composed of two parts, the encoder and the decoder.
The encoder accepts sequences of frames as input, and the learned representation generated from encoder are copied to decoder as initial input. Then the decoder should reconstruct similar images like input frames in reverse order.
(This is called unconditional version, while a conditional version receives last generated output of decoder as input, shown as the dashed boxes below)
Intuition: The reconstruction work requires the network to capture information about the appearance of objects and the background, this is exactly the information that we would like the representation to contain.
2) LSTM Future Predictor Model:

This model is similar with the one above. The main difference lies in the output. Output of this model is the prediction of frames that come just after the input sequences. It also varies with conditional/unconditional versions just like the description above.
Intuition: In order to predict the next few frames correctly, the model needs information about which objects are present and how they are moving so that the motion can be extrapolated.
3) A Composite Model:

This model combines "input reconstruction" and "future prediction" together to form a more powerful model. These two modules share a same encoder, which encodes input sequences into a feature vector and copy them to different decoders.
Intuition: this only encoder learns representations that contain not only static appearance of objects&background, but also the dynamic informations like moving objects and their moving pattern.
【ML】ICML2015_Unsupervised Learning of Video Representations using LSTMs的更多相关文章
- 【CV】ICCV2015_Unsupervised Learning of Visual Representations using Videos
Unsupervised Learning of Visual Representations using Videos Note here: it's a learning note on Prof ...
- 论文阅读笔记(三)【AAAI2017】:Learning Heterogeneous Dictionary Pair with Feature Projection Matrix for Pedestrian Video Retrieval via Single Query Image
Introduction (1)IVPR问题: 根据一张图片从视频中识别出行人的方法称为 image to video person re-id(IVPR) 应用: ① 通过嫌犯照片,从视频中识别出嫌 ...
- ZH奶酪:【阅读笔记】Deep Learning, NLP, and Representations
中文译文:深度学习.自然语言处理和表征方法 http://blog.jobbole.com/77709/ 英文原文:Deep Learning, NLP, and Representations ht ...
- 【ML】Two-Stream Convolutional Networks for Action Recognition in Videos
Two-Stream Convolutional Networks for Action Recognition in Videos & Towards Good Practices for ...
- 【ML】ICLR2016_Delving Deeper into Convolutional Networks
ICLR2016_DELVING DEEPER INTO CONVOLUTIONAL NETWORKS Note here: Ballas recently proposed a novel fram ...
- 【RS】CoupledCF: Learning Explicit and Implicit User-item Couplings in Recommendation for Deep Collaborative Filtering-CoupledCF:在推荐系统深度协作过滤中学习显式和隐式的用户物品耦合
[论文标题]CoupledCF: Learning Explicit and Implicit User-item Couplings in Recommendation for Deep Colla ...
- 【RS】List-wise learning to rank with matrix factorization for collaborative filtering - 结合列表启发排序和矩阵分解的协同过滤
[论文标题]List-wise learning to rank with matrix factorization for collaborative filtering (RecSys '10 ...
- 【RS】Deep Learning based Recommender System: A Survey and New Perspectives - 基于深度学习的推荐系统:调查与新视角
[论文标题]Deep Learning based Recommender System: A Survey and New Perspectives ( ACM Computing Surveys ...
- 【ML】Predict and Constrain: Modeling Cardinality in Deep Structured Prediction -预测和约束:在深度结构化预测中建模基数
[论文标题]Predict and Constrain: Modeling Cardinality in Deep Structured Prediction (35th-ICML,PMLR) [ ...
随机推荐
- windows7系统最大支持多少内存
目前Windows 7 64位版仅能使用最大为192GB内存. 这是各个版本的具体数据:64位的Windows 7家庭普通版最高可支持8GB内存,家庭高级版最高可支持16GB内存,64位的Win ...
- 使用JavaConfig和注解方式实现零xml配置的Spring MVC项目
1. 引言 Spring MVC是Spring框架重要组成部分,是一款非常优秀的Web框架.Spring MVC以DispatcherServlet为核心,通过可配置化的方式去处理各种web请求. 在 ...
- python3的C3算法
一.基本概念 1. mro序列 MRO是一个有序列表L,在类被创建时就计算出来. 通用计算公式为: mro(Child(Base1,Base2)) = [ Child ] + merge( mro(B ...
- vue_模板渲染
渲染 当获取到后端数据后,我们会把它按照一定的规则加载到写好的模板中,输出成在浏览器中显示的HTML,这个过程就称之为渲染. vue.js是在前端(即浏览器内)进行的模板渲染. 前后端渲染对比 前端渲 ...
- GUI概述与Frame演示
java 图形化界面的对象存在这两个包中: java.awt :Abstract WindowsToolkit(抽象窗口工具包)需要调用本地系统方法实现功能,属重量级控件 javax.swing:在a ...
- Django templates 模板的语法
MVC 以及 MTV MVC: M : model -->> 存取数据(模型) V: view -->> 信息的展示(视图) C: controller -->> ...
- CNAME记录和A记录
主机名:host.abcd.com 别名:一台主机可以提供多种服务,比如http服务和mail服务. 访问http服务就可以使用域名:www.abcd.com 访问mail服务就可以使用域名:mail ...
- 20145203盖泽双 《网络对抗技术》实践九:Web安全基础实践
20145203盖泽双 <网络对抗技术>实践九:Web安全基础实践 1.实践目标 1.理解常用网络攻击技术的基本原理. 2.Webgoat下进行相关实验:SQL注入攻击.XSS攻击.CSR ...
- day14 Python函数之可变长参数
函数参数 1.形参变量只有在被调用时才分配内存单元,在调用结束时,即刻释放所分配的内存单元.因此,形参只在函数内部有效.函数调用结束返回主调用函数后则不能再使用该形参变量 2.实参可以是常量.变量.表 ...
- oracle 12c 12.1.0.2.0 BUG 22562145
Wed May 23 17:46:14 2018TT01: Standby redo logfile selected for thread 1 sequence 42251 for destinat ...