Paper Reading - Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation ( CVPR 2015 )

Link of the Paper: https://ieeexplore.ieee.org/document/7298856/

A Correlative Paper: Learning a Recurrent Visual Representation for Image Caption Generation (Link of the Paper: https://arxiv.org/abs/1411.5654)

Main Points:

A bi-directional mapping model using recurrent neural networks: unlike previous approaches which map both sentences and images to a common embedding ( and then calculate the similarity and match / generate, I guess ) that may be used for image search or for ranking image captions.
A bi-directional representation: generates both novel descriptions from images and visual representations from descriptions.
A novel recurrent visual memory: automatically learns to remember long-term visual concepts.
A set of latent variables U_t-1 that encodes the visual interpretation of the previously generated or read words W_t-1. Using U, our goal is to compute P(w_t | V, W_t-1, U_t-1) and P(V | W_t-1, U_t-1). Combining these two likelihoods together our global objective is to maximize, P(w_t, V | W_t-1, U_t-1) = P(w_t | V, W_t-1, U_t-1)P(V | W_t-1, U_t-1). That is, we want to maximize the likelihood of the word w_t and the observed visual features V given the previous words and their visual interpretation. Note that in previous papers, the objective was only to compute P(w_t | V, W_t-1) and not P(V | W_t-1).

Other Key Points:

Previous approaches project both semantics and visual features to a common embedding, they are not able to perform the inverse projection. That is, they cannot generate novel sentences or visual depictions from the embedding.

Paper Reading - Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation ( CVPR 2015 )的更多相关文章

Paper Reading - Deep Captioning with Multimodal Recurrent Neural Networks ( m-RNN ) ( ICLR 2015 ) ★
Link of the Paper: https://arxiv.org/pdf/1412.6632.pdf Main Points: The authors propose a multimodal ...
Paper Reading - Show and Tell: A Neural Image Caption Generator ( CVPR 2015 )
Link of the Paper: https://arxiv.org/abs/1411.4555 Main Points: A generative model ( NIC, GoogLeNet ...
Paper Reading: Stereo DSO
开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...
Paper Reading - CNN+CNN: Convolutional Decoders for Image Captioning
Link of the Paper: https://arxiv.org/abs/1805.09019 Innovations: The authors propose a CNN + CNN fra ...
Paper Reading: In Defense of the Triplet Loss for Person Re-Identification
In Defense of the Triplet Loss for Person Re-Identification 2017-07-02 14:04:20 This blog comes ...
CVPR 2016 paper reading (6)
1. Neuroaesthetics in fashion: modeling the perception of fashionability, Edgar Simo-Serra, Sanja Fi ...
论文笔记：Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language Association
Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language ...
论文笔记：Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention 2018-08-10 10:15:06 Pap ...
【CV】ICCV2015_Unsupervised Visual Representation Learning by Context Prediction
Unsupervised Visual Representation Learning by Context Prediction Note here: it's a learning note on ...

随机推荐

e2fsprogs
开源文件系统ext2/ext3/ext4管理工具e2progs包含的工具组件: 1.debugfs: ext2/ext3/ext4文件系统调试工具.debugfs是一个交互式的文件系统调试工具,可以用 ...
MyBatis配置数据库连接
<environments default="default"> <environment id="default"> <tran ...
Hbuilder软件打包简述
Hbuilder打包简述: : Hbuilder安装打包Android不需要任何证书可以正常打包. : ios打包需要.mobileprovision证书和P12文件.(.mobileprovisio ...
sql中查询某月某年内的记录
假设表结构:用户名,日期,上班时间,下班时间.8月份记录:select * from 表名 where month(日期)=8 and 用户名 = '小张'8月份迟到早退次数:select sum(i ...
#leetcode刷题之路22-括号生成
给出 n 代表生成括号的对数,请你写出一个函数,使其能够生成所有可能的并且有效的括号组合. 例如,给出 n = 3,生成结果为:[ "((()))", "(()())&q ...
10JavaScript作用域
(作用域可访问变量的集合) 1.JavaScript 作用域在 JavaScript 中, 对象和函数同样也是变量. 在 JavaScript 中, 作用域为可访问变量,对象,函数的集合. Java ...
e.currentTarget与e.target
e.currentTarget指的是注册了事件监听器的对象,而e.target指的是该对象里的子对象 html中 <div id="addBtn" v-on:click= ...
[译]C语言实现一个简易的Hash table(7)
上一章我们讲了如何根据需要动态设置hash表的大小,在第四章中,我们使用了双重哈希来解决hash表的碰撞,其实解决方法有很多,这一章我们来介绍下其他方法. 本章将介绍两种解决hash表碰撞的方法: 拉 ...
MySQL用全库备份数据恢复单表数据
备份数据库时,采用了全库备份,但是因为某些原因需要回滚一个表的数据到备份数据库上,如果回滚整个库就比较费时间,因为可能这个表只有几十M,但是其它表可能有十几上百G,这时候就需要将需要恢复的表提取出来了 ...
Golang Gin 项目包依赖管理 godep 使用
Golang Gin 项目包依赖管理 godep 使用标签(空格分隔): Go 在按照github.com/tools/godep文档go get完包以后,调整项目结构为$GOPATH/src/$P ...

Paper Reading - Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation ( CVPR 2015 )

Paper Reading - Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation ( CVPR 2015 )的更多相关文章

随机推荐

热门专题