【ML】ICML2015_Unsupervised Learning of Video Representations using LSTMs
Unsupervised Learning of Video Representations using LSTMs
Note here: it's a learning notes on new LSTMs architecture used as an unsupervised learning way of video representations.
(More unsupervised learning related topics, you can refer to:
Learning Temporal Embeddings for Complex Video Analysis
Unsupervised Learning of Visual Representations using Videos
Unsupervised Visual Representation Learning by Context Prediction)
Link: http://arxiv.org/abs/1502.04681
Motivation:
- Understanding temporal sequences is important for solving many video related problems. We should utilize temporal structure of videos as a supervisory signal for unsupervised learning.
Proposed model:
In this paper, the author proposed three models based on LSTM:
1) LSTM Autoencoder Model:

This model is composed of two parts, the encoder and the decoder.
The encoder accepts sequences of frames as input, and the learned representation generated from encoder are copied to decoder as initial input. Then the decoder should reconstruct similar images like input frames in reverse order.
(This is called unconditional version, while a conditional version receives last generated output of decoder as input, shown as the dashed boxes below)
Intuition: The reconstruction work requires the network to capture information about the appearance of objects and the background, this is exactly the information that we would like the representation to contain.
2) LSTM Future Predictor Model:

This model is similar with the one above. The main difference lies in the output. Output of this model is the prediction of frames that come just after the input sequences. It also varies with conditional/unconditional versions just like the description above.
Intuition: In order to predict the next few frames correctly, the model needs information about which objects are present and how they are moving so that the motion can be extrapolated.
3) A Composite Model:

This model combines "input reconstruction" and "future prediction" together to form a more powerful model. These two modules share a same encoder, which encodes input sequences into a feature vector and copy them to different decoders.
Intuition: this only encoder learns representations that contain not only static appearance of objects&background, but also the dynamic informations like moving objects and their moving pattern.
【ML】ICML2015_Unsupervised Learning of Video Representations using LSTMs的更多相关文章
- 【CV】ICCV2015_Unsupervised Learning of Visual Representations using Videos
Unsupervised Learning of Visual Representations using Videos Note here: it's a learning note on Prof ...
- 论文阅读笔记(三)【AAAI2017】:Learning Heterogeneous Dictionary Pair with Feature Projection Matrix for Pedestrian Video Retrieval via Single Query Image
Introduction (1)IVPR问题: 根据一张图片从视频中识别出行人的方法称为 image to video person re-id(IVPR) 应用: ① 通过嫌犯照片,从视频中识别出嫌 ...
- ZH奶酪:【阅读笔记】Deep Learning, NLP, and Representations
中文译文:深度学习.自然语言处理和表征方法 http://blog.jobbole.com/77709/ 英文原文:Deep Learning, NLP, and Representations ht ...
- 【ML】Two-Stream Convolutional Networks for Action Recognition in Videos
Two-Stream Convolutional Networks for Action Recognition in Videos & Towards Good Practices for ...
- 【ML】ICLR2016_Delving Deeper into Convolutional Networks
ICLR2016_DELVING DEEPER INTO CONVOLUTIONAL NETWORKS Note here: Ballas recently proposed a novel fram ...
- 【RS】CoupledCF: Learning Explicit and Implicit User-item Couplings in Recommendation for Deep Collaborative Filtering-CoupledCF:在推荐系统深度协作过滤中学习显式和隐式的用户物品耦合
[论文标题]CoupledCF: Learning Explicit and Implicit User-item Couplings in Recommendation for Deep Colla ...
- 【RS】List-wise learning to rank with matrix factorization for collaborative filtering - 结合列表启发排序和矩阵分解的协同过滤
[论文标题]List-wise learning to rank with matrix factorization for collaborative filtering (RecSys '10 ...
- 【RS】Deep Learning based Recommender System: A Survey and New Perspectives - 基于深度学习的推荐系统:调查与新视角
[论文标题]Deep Learning based Recommender System: A Survey and New Perspectives ( ACM Computing Surveys ...
- 【ML】Predict and Constrain: Modeling Cardinality in Deep Structured Prediction -预测和约束:在深度结构化预测中建模基数
[论文标题]Predict and Constrain: Modeling Cardinality in Deep Structured Prediction (35th-ICML,PMLR) [ ...
随机推荐
- 如何在Windows上挂载Linux系统分区
NFS普遍用于unix之间共享,windows默认是不支持这种文件系统的.如果我们要用windows访问NFS的话,而windows系统自身又不支持这种文件系统,那么我们该怎么办? 别急,小编这就手把 ...
- MDX 脚本语句 -- Scope
在多维表达式 (MDX) 中,下列语句用于管理 MDX 脚本中的上下文.作用域和流控制. 主题 说明 calculate语句 计算子多维数据集,还可以确定子多维数据集中所包含的求解次序 case语句 ...
- Android中长度单位和边距
Android表示单位长度的方式通常有三种表示方式. 距离单位☞px:表示屏幕实际的象素.例如,320*480的屏幕在横向有320个象素,在纵向有480个象素 距离单位☞dp:dp = dpi ...
- 寒假集训——搜索 B - Sudoku
#include <stdio.h> #include <stdlib.h> #include <string.h> #include <iostream&g ...
- 捕获海康威视IPCamera图像,转成OpenCV能够处理的图像(二)
海康威视IPCamera图像捕获 捕获海康威视IPCamera图像.转成OpenCV能够处理的IplImage图像(一) 捕获海康威视IPCamera图像.转成OpenCV能够处理的IplImage图 ...
- iptables和firewalld的配置
一.iptables 1.配置 vi /etc/sysconfig/iptables -A RH-Firewall-1-INPUT -m state --state NEW -p tcp -m tcp ...
- dp HDU - 5074
按题意推表达式 #include<cstdio> #include<cstring> #define max(a, b) (a)>(b)?(a):(b) ][], num ...
- 第一行代码 3-2-2 软件也要拼脸蛋-UI界面-更强大的滚动条- 卡片
<LinearLayout android:orientation="vertical" android:layout_width="match_parent&qu ...
- YOLO2(1)配置安装win10+openvc2413+VS2013 简单测试官例
参考官网 https://github.com/AlexeyAB/darknet#how-to-compile-on-windows https://github.com/AlexeyAB/darkn ...
- go标准库的学习-hash
参考:https://studygolang.com/pkgdoc 导入方式: import "hash" hash包提供hash函数的接口. type Hash type Has ...