【ML】ICML2015_Unsupervised Learning of Video Representations using LSTMs
Unsupervised Learning of Video Representations using LSTMs
Note here: it's a learning notes on new LSTMs architecture used as an unsupervised learning way of video representations.
(More unsupervised learning related topics, you can refer to:
Learning Temporal Embeddings for Complex Video Analysis
Unsupervised Learning of Visual Representations using Videos
Unsupervised Visual Representation Learning by Context Prediction)
Link: http://arxiv.org/abs/1502.04681
Motivation:
- Understanding temporal sequences is important for solving many video related problems. We should utilize temporal structure of videos as a supervisory signal for unsupervised learning.
Proposed model:
In this paper, the author proposed three models based on LSTM:
1) LSTM Autoencoder Model:

This model is composed of two parts, the encoder and the decoder.
The encoder accepts sequences of frames as input, and the learned representation generated from encoder are copied to decoder as initial input. Then the decoder should reconstruct similar images like input frames in reverse order.
(This is called unconditional version, while a conditional version receives last generated output of decoder as input, shown as the dashed boxes below)
Intuition: The reconstruction work requires the network to capture information about the appearance of objects and the background, this is exactly the information that we would like the representation to contain.
2) LSTM Future Predictor Model:

This model is similar with the one above. The main difference lies in the output. Output of this model is the prediction of frames that come just after the input sequences. It also varies with conditional/unconditional versions just like the description above.
Intuition: In order to predict the next few frames correctly, the model needs information about which objects are present and how they are moving so that the motion can be extrapolated.
3) A Composite Model:

This model combines "input reconstruction" and "future prediction" together to form a more powerful model. These two modules share a same encoder, which encodes input sequences into a feature vector and copy them to different decoders.
Intuition: this only encoder learns representations that contain not only static appearance of objects&background, but also the dynamic informations like moving objects and their moving pattern.
【ML】ICML2015_Unsupervised Learning of Video Representations using LSTMs的更多相关文章
- 【CV】ICCV2015_Unsupervised Learning of Visual Representations using Videos
Unsupervised Learning of Visual Representations using Videos Note here: it's a learning note on Prof ...
- 论文阅读笔记(三)【AAAI2017】:Learning Heterogeneous Dictionary Pair with Feature Projection Matrix for Pedestrian Video Retrieval via Single Query Image
Introduction (1)IVPR问题: 根据一张图片从视频中识别出行人的方法称为 image to video person re-id(IVPR) 应用: ① 通过嫌犯照片,从视频中识别出嫌 ...
- ZH奶酪:【阅读笔记】Deep Learning, NLP, and Representations
中文译文:深度学习.自然语言处理和表征方法 http://blog.jobbole.com/77709/ 英文原文:Deep Learning, NLP, and Representations ht ...
- 【ML】Two-Stream Convolutional Networks for Action Recognition in Videos
Two-Stream Convolutional Networks for Action Recognition in Videos & Towards Good Practices for ...
- 【ML】ICLR2016_Delving Deeper into Convolutional Networks
ICLR2016_DELVING DEEPER INTO CONVOLUTIONAL NETWORKS Note here: Ballas recently proposed a novel fram ...
- 【RS】CoupledCF: Learning Explicit and Implicit User-item Couplings in Recommendation for Deep Collaborative Filtering-CoupledCF:在推荐系统深度协作过滤中学习显式和隐式的用户物品耦合
[论文标题]CoupledCF: Learning Explicit and Implicit User-item Couplings in Recommendation for Deep Colla ...
- 【RS】List-wise learning to rank with matrix factorization for collaborative filtering - 结合列表启发排序和矩阵分解的协同过滤
[论文标题]List-wise learning to rank with matrix factorization for collaborative filtering (RecSys '10 ...
- 【RS】Deep Learning based Recommender System: A Survey and New Perspectives - 基于深度学习的推荐系统:调查与新视角
[论文标题]Deep Learning based Recommender System: A Survey and New Perspectives ( ACM Computing Surveys ...
- 【ML】Predict and Constrain: Modeling Cardinality in Deep Structured Prediction -预测和约束:在深度结构化预测中建模基数
[论文标题]Predict and Constrain: Modeling Cardinality in Deep Structured Prediction (35th-ICML,PMLR) [ ...
随机推荐
- 【PAT】B1063 计算谱半径(20 分)
水题,没有难点 #include<stdio.h> #include<algorithm> #include<math.h> using namespace std ...
- Lua 与 C 交互之UserData(4)
lua作为脚本于要能够使用宿主语言的类型,不管是宿主基本的或者扩展的类型结构,所以Lua提供的UserData来满足扩展的需求.在Lua中使用宿主语言的类型至少要考虑到几个方面: 数据内存 生命周期 ...
- Class doesn't contain any JAX-RS annotated method
项目中使用了Jersey RESTful风格的注解 , 根据错误提示就知道了 , 在类中没有带注解的方法 ,所以只要在方法上添加 @path() 注解就行了,至少要有一个方法带有Jersey注解
- 拓普微小尺寸TFT液晶屏-高性价比
智能模块(Smart LCD)是专为工业显示应用而设计的TFT液晶显示模块. 模块自带主控IC.Flash存储器.实时嵌入式操作系统,客户主机可把要存储的数据(如背景图.图标等)存储到屏的flash中 ...
- C#接口的显隐实现
显示接口实现与隐式接口实现 何为显式接口实现.隐式接口实现?简单概括,使用接口名作为方法名的前缀,这称为“显式接口实现”:传统的实现方式,称为“隐式接口实现”.下面给个例子. IChineseGree ...
- 数据库连性池性能测试(hikariCP,druid,tomcat-jdbc,dbcp,c3p0)
文章转自 https://www.tuicool.com/articles/qayayiM 摘要: 本文主要是对这hikariCP,druid,tomcat-jdbc,dbcp,c3p0几种连接池的 ...
- mybatis基础系列(一)——mybatis入门
好久不发博客了,写博文的一个好处是能让心静下来,整理下之前学习过的一些知识一起分享,大神路过~ mybatis简介 MyBatis 是一款优秀的持久层框架,它支持定制化 SQL.存储过程以及高级映射. ...
- 纯html页面中js如何获得项目路径
js,全称javascript,不过虽然是以java开头,不过与java一点关系都没有. js和java有如下区别: (1)js是浏览器端的语言,而java是服务器端的语言. (2)js是动态语言,j ...
- java 面向对象基本概念
1.面向对象与面向过程 (1).都是解决问题的思维方式,都是代码组织的方式. (2).解决简单问题可以使用面向过程 (3).解决复杂问题:宏观上使用面向对象把握,微观处理上仍然是面向过程. 2.面向对 ...
- kubernetes集群中对多个pod操作命令
$ for i in 0 1; do kubectl exec web-$i -- sh -c 'echo hello $(hostname) > /usr/share/nginx/html/i ...