【ML】ICML2015_Unsupervised Learning of Video Representations using LSTMs
Unsupervised Learning of Video Representations using LSTMs
Note here: it's a learning notes on new LSTMs architecture used as an unsupervised learning way of video representations.
(More unsupervised learning related topics, you can refer to:
Learning Temporal Embeddings for Complex Video Analysis
Unsupervised Learning of Visual Representations using Videos
Unsupervised Visual Representation Learning by Context Prediction)
Link: http://arxiv.org/abs/1502.04681
Motivation:
- Understanding temporal sequences is important for solving many video related problems. We should utilize temporal structure of videos as a supervisory signal for unsupervised learning.
Proposed model:
In this paper, the author proposed three models based on LSTM:
1) LSTM Autoencoder Model:

This model is composed of two parts, the encoder and the decoder.
The encoder accepts sequences of frames as input, and the learned representation generated from encoder are copied to decoder as initial input. Then the decoder should reconstruct similar images like input frames in reverse order.
(This is called unconditional version, while a conditional version receives last generated output of decoder as input, shown as the dashed boxes below)
Intuition: The reconstruction work requires the network to capture information about the appearance of objects and the background, this is exactly the information that we would like the representation to contain.
2) LSTM Future Predictor Model:

This model is similar with the one above. The main difference lies in the output. Output of this model is the prediction of frames that come just after the input sequences. It also varies with conditional/unconditional versions just like the description above.
Intuition: In order to predict the next few frames correctly, the model needs information about which objects are present and how they are moving so that the motion can be extrapolated.
3) A Composite Model:

This model combines "input reconstruction" and "future prediction" together to form a more powerful model. These two modules share a same encoder, which encodes input sequences into a feature vector and copy them to different decoders.
Intuition: this only encoder learns representations that contain not only static appearance of objects&background, but also the dynamic informations like moving objects and their moving pattern.
【ML】ICML2015_Unsupervised Learning of Video Representations using LSTMs的更多相关文章
- 【CV】ICCV2015_Unsupervised Learning of Visual Representations using Videos
Unsupervised Learning of Visual Representations using Videos Note here: it's a learning note on Prof ...
- 论文阅读笔记(三)【AAAI2017】:Learning Heterogeneous Dictionary Pair with Feature Projection Matrix for Pedestrian Video Retrieval via Single Query Image
Introduction (1)IVPR问题: 根据一张图片从视频中识别出行人的方法称为 image to video person re-id(IVPR) 应用: ① 通过嫌犯照片,从视频中识别出嫌 ...
- ZH奶酪:【阅读笔记】Deep Learning, NLP, and Representations
中文译文:深度学习.自然语言处理和表征方法 http://blog.jobbole.com/77709/ 英文原文:Deep Learning, NLP, and Representations ht ...
- 【ML】Two-Stream Convolutional Networks for Action Recognition in Videos
Two-Stream Convolutional Networks for Action Recognition in Videos & Towards Good Practices for ...
- 【ML】ICLR2016_Delving Deeper into Convolutional Networks
ICLR2016_DELVING DEEPER INTO CONVOLUTIONAL NETWORKS Note here: Ballas recently proposed a novel fram ...
- 【RS】CoupledCF: Learning Explicit and Implicit User-item Couplings in Recommendation for Deep Collaborative Filtering-CoupledCF:在推荐系统深度协作过滤中学习显式和隐式的用户物品耦合
[论文标题]CoupledCF: Learning Explicit and Implicit User-item Couplings in Recommendation for Deep Colla ...
- 【RS】List-wise learning to rank with matrix factorization for collaborative filtering - 结合列表启发排序和矩阵分解的协同过滤
[论文标题]List-wise learning to rank with matrix factorization for collaborative filtering (RecSys '10 ...
- 【RS】Deep Learning based Recommender System: A Survey and New Perspectives - 基于深度学习的推荐系统:调查与新视角
[论文标题]Deep Learning based Recommender System: A Survey and New Perspectives ( ACM Computing Surveys ...
- 【ML】Predict and Constrain: Modeling Cardinality in Deep Structured Prediction -预测和约束:在深度结构化预测中建模基数
[论文标题]Predict and Constrain: Modeling Cardinality in Deep Structured Prediction (35th-ICML,PMLR) [ ...
随机推荐
- sublime text3最常用快捷键
Ctrl+Shift+P:打开命令面板 Ctrl+P:搜索项目中的文件 Ctrl+G:跳转到第几行 Ctrl+W:关闭当前打开文件 Ctrl+Shift+W:关闭所有打开文件 Ctrl+Shift+V ...
- Hbase-2.0.0_04_Hbase原理
参考博客:Hadoop HBase概念学习系列 参考博客:Hadoop HBase概念学习系列之HBase里的Zookeeper(二十一) 参考博客:Hadoop HBase概念学习系列之HBase里 ...
- 【阿里八八】团队Alpha博客链接目录
团队Alpha冲刺博客 阿里八八Alpha阶段Scrum(1/12) 阿里八八Alpha阶段Scrum(2/12) 阿里八八Alpha阶段Scrum(3/12) 阿里八八Alpha阶段Scrum(4/ ...
- Mac OS X 下优化 Terminal,一篇就够了!
先上最终效果图: 目录 目录 1. 相关工具介绍 2. 配置总览 3. 安装步骤 3.1. 安装 iTerm2 3.2. 安装XCode's Command line tools 3.3. 检查 zs ...
- webpack打包去掉console.log打印与debugger调试
如图,找到build/webpack.prod.conf.js 在 UglifyJsPlugin 插件下添加下列代码 drop_debugger: true, drop_console: true
- C# X509Certificate类 调用证书
一.命名空间 using System.Security.Cryptography.X509Certificates; 二.调用代码 string certPath = Server.MapPath( ...
- Bootstrap收尾
一 响应式布局 二 Bootstrap补充 三 常用插件 一 响应式布局 响应式介绍 - 响应式布局是什么? 同一个网页在不同的终端上呈现不同的布局等 - 响应式怎么实现的? 1. CSS3 m ...
- 【洛谷】【模拟+栈】P4711 「化学」相对分子质量
[题目传送门:] [戳] (https://www.luogu.org/problemnew/show/P4711) [算法分析:] 关于一个分子拆分后的产物,一共有三种情况: 原子 原子团 水合物 ...
- jvm虚拟机分享课笔记
深入理解jvm虚拟机分享 1. jvm执行流程 java-编译-.class—类加载器(随时随地加载)--[进入java虚拟机] 执行引擎—本地方法接口---本地方法库 运行时数据区 2. 运行时数据 ...
- pyspider爬一批文章保存到word中
最近一直在爬新闻,对于新闻爬取的套路还是比较熟悉的.一个群友发布了一个爬文章入word的任务,我果断接单,自我挑战一下,更何况完成任务还有赏金,哈哈. 任务大概是这样的,有一个文章列表[http:// ...