Unsupervised Learning of Video Representations using LSTMs

Note here: it's a learning notes on new LSTMs architecture used as an unsupervised learning way of video representations.

(More unsupervised learning related topics, you can refer to:

Learning Temporal Embeddings for Complex Video Analysis

Unsupervised Learning of Visual Representations using Videos

Unsupervised Visual Representation Learning by Context Prediction)

Link: http://arxiv.org/abs/1502.04681

Motivation:

- Understanding temporal sequences is important for solving many video related problems. We should utilize temporal structure of videos as a supervisory signal for unsupervised learning.

Proposed model:

In this paper, the author proposed three models based on LSTM:

1) LSTM Autoencoder Model:

  This model is composed of two parts, the encoder and the decoder.

  The encoder accepts sequences of frames as input, and the learned representation generated from encoder are copied to decoder as initial input. Then the decoder should reconstruct similar images like input frames in reverse order.

  (This is called unconditional version, while a conditional version receives last generated output of decoder as input, shown as the dashed boxes below)

Intuition: The reconstruction work requires the network to capture information about the appearance of objects and the background, this is exactly the information that we would like the representation to contain.

2) LSTM Future Predictor Model:

  This model is similar with the one above. The main difference lies in the output. Output of this model is the prediction of frames that come just after the input sequences. It also varies with conditional/unconditional versions just like the description above.

Intuition: In order to predict the next few frames correctly, the model needs information about which objects are present and how they are moving so that the motion can be extrapolated.

3) A Composite Model:

  This model combines "input reconstruction" and "future prediction" together to form a more powerful model. These two modules share a same encoder, which encodes input sequences into a feature vector and copy them to different decoders.

Intuition: this only encoder learns representations that contain not only static appearance of objects&background, but also the dynamic informations like moving objects and their moving pattern.

【ML】ICML2015_Unsupervised Learning of Video Representations using LSTMs的更多相关文章

  1. 【CV】ICCV2015_Unsupervised Learning of Visual Representations using Videos

    Unsupervised Learning of Visual Representations using Videos Note here: it's a learning note on Prof ...

  2. 论文阅读笔记(三)【AAAI2017】:Learning Heterogeneous Dictionary Pair with Feature Projection Matrix for Pedestrian Video Retrieval via Single Query Image

    Introduction (1)IVPR问题: 根据一张图片从视频中识别出行人的方法称为 image to video person re-id(IVPR) 应用: ① 通过嫌犯照片,从视频中识别出嫌 ...

  3. ZH奶酪:【阅读笔记】Deep Learning, NLP, and Representations

    中文译文:深度学习.自然语言处理和表征方法 http://blog.jobbole.com/77709/ 英文原文:Deep Learning, NLP, and Representations ht ...

  4. 【ML】Two-Stream Convolutional Networks for Action Recognition in Videos

    Two-Stream Convolutional Networks for Action Recognition in Videos & Towards Good Practices for ...

  5. 【ML】ICLR2016_Delving Deeper into Convolutional Networks

    ICLR2016_DELVING DEEPER INTO CONVOLUTIONAL NETWORKS Note here: Ballas recently proposed a novel fram ...

  6. 【RS】CoupledCF: Learning Explicit and Implicit User-item Couplings in Recommendation for Deep Collaborative Filtering-CoupledCF:在推荐系统深度协作过滤中学习显式和隐式的用户物品耦合

    [论文标题]CoupledCF: Learning Explicit and Implicit User-item Couplings in Recommendation for Deep Colla ...

  7. 【RS】List-wise learning to rank with matrix factorization for collaborative filtering - 结合列表启发排序和矩阵分解的协同过滤

    [论文标题]List-wise learning to rank with matrix factorization for collaborative filtering   (RecSys '10 ...

  8. 【RS】Deep Learning based Recommender System: A Survey and New Perspectives - 基于深度学习的推荐系统:调查与新视角

    [论文标题]Deep Learning based Recommender System: A Survey and New Perspectives ( ACM Computing Surveys  ...

  9. 【ML】Predict and Constrain: Modeling Cardinality in Deep Structured Prediction -预测和约束:在深度结构化预测中建模基数

    [论文标题]Predict and Constrain: Modeling Cardinality in Deep Structured Prediction   (35th-ICML,PMLR) [ ...

随机推荐

  1. PLSQL无法粘贴复制

    有2个原因会导致这个问题发生: 一:快捷键设置不正确,按照网上的设置方法把复制粘贴的快捷键重新设置一下,然后重启plsql 二:远程桌面连接开着,关闭后试下(亲测有效)

  2. 使用zip压缩文件夹方法

    最近使用MapGis对.MPJ工程文件文件裁剪后,要对裁剪后的图形文件.ML,.MT,.MP,.MPJ文件打包,在网上找到7zip,Zlib的库,虽然都有源码,但是Zlib库中的使用没找到文件压缩的函 ...

  3. Spring boot 之 dubbo 无xml 简单入门

    Dubbo简介 Dubbo框架设计一共划分了10个层,而最上面的Service层是留给实际想要使用Dubbo开发分布式服务的开发者实现业务逻辑的接口层.图中左边淡蓝背景的为服务消费方使用的接口,右边淡 ...

  4. Web服务器的反向代理nginx

    nginx作为web服务器一个重要的功能就是反向代理. Nginx配置详解   序言 Nginx是lgor Sysoev为俄罗斯访问量第二的rambler.ru站点设计开发的.从2004年发布至今,凭 ...

  5. IO_ObjectOutputStream(对象的序列化)

    对象序列化就是将一些对象写入到硬盘中存储起来,以便下次复用 import java.io.FileInputStream; import java.io.FileOutputStream; impor ...

  6. android:layout_margin真实含义 及 自己定义复合控件 layout()运行无效的问题解决

    一.关于layout_margin 搞Android时间也不短了.对layout_margin也不陌生了,可近期遇到一个问题让我发现,对它的认识还不够深入全面.大量网络资料上都说,layout_mar ...

  7. Excel各种条件求和的公式汇总

    经常和Execl打交道的人肯定觉得求和公式是大家时常用到的.Excel里有哪几路求和公式呢?他们的使用方式又是怎样?我为大家汇总一下. 使用SUMIF()公式的单条件求和: 如要统计C列中的数据,要求 ...

  8. UCML JS函数说明

    UCML JS函数说明1.调用父窗体函数 window.openerWindow.函数名 2.公用JS存放位置 BPObject\Model\Rule\initvalue.js 3.弹窗JS var ...

  9. POJ2253(dijkstra堆优化)

    https://vjudge.net/problem/POJ-2253 Freddy Frog is sitting on a stone in the middle of a lake. Sudde ...

  10. 可以设置超时版的的fetch

    // 超时版的fetch _fetch(fetch, timeout) { return Promise.race([ fetch, new Promise(function (resolve, re ...