Paper Reading - Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation ( CVPR 2015 )
Link of the Paper: https://ieeexplore.ieee.org/document/7298856/
A Correlative Paper: Learning a Recurrent Visual Representation for Image Caption Generation (Link of the Paper: https://arxiv.org/abs/1411.5654)
Main Points:
- A bi-directional mapping model using recurrent neural networks: unlike previous approaches which map both sentences and images to a common embedding ( and then calculate the similarity and match / generate, I guess ) that may be used for image search or for ranking image captions.
- A bi-directional representation: generates both novel descriptions from images and visual representations from descriptions.
- A novel recurrent visual memory: automatically learns to remember long-term visual concepts.
- A set of latent variables Ut-1 that encodes the visual interpretation of the previously generated or read words Wt-1. Using U, our goal is to compute P(wt | V, Wt-1, Ut-1) and P(V | Wt-1, Ut-1). Combining these two likelihoods together our global objective is to maximize, P(wt, V | Wt-1, Ut-1) = P(wt | V, Wt-1, Ut-1)P(V | Wt-1, Ut-1). That is, we want to maximize the likelihood of the word wt and the observed visual features V given the previous words and their visual interpretation. Note that in previous papers, the objective was only to compute P(wt | V, Wt-1) and not P(V | Wt-1).

Other Key Points:
- Previous approaches project both semantics and visual features to a common embedding, they are not able to perform the inverse projection. That is, they cannot generate novel sentences or visual depictions from the embedding.
Paper Reading - Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation ( CVPR 2015 )的更多相关文章
- Paper Reading - Deep Captioning with Multimodal Recurrent Neural Networks ( m-RNN ) ( ICLR 2015 ) ★
Link of the Paper: https://arxiv.org/pdf/1412.6632.pdf Main Points: The authors propose a multimodal ...
- Paper Reading - Show and Tell: A Neural Image Caption Generator ( CVPR 2015 )
Link of the Paper: https://arxiv.org/abs/1411.4555 Main Points: A generative model ( NIC, GoogLeNet ...
- Paper Reading: Stereo DSO
开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...
- Paper Reading - CNN+CNN: Convolutional Decoders for Image Captioning
Link of the Paper: https://arxiv.org/abs/1805.09019 Innovations: The authors propose a CNN + CNN fra ...
- Paper Reading: In Defense of the Triplet Loss for Person Re-Identification
In Defense of the Triplet Loss for Person Re-Identification 2017-07-02 14:04:20 This blog comes ...
- CVPR 2016 paper reading (6)
1. Neuroaesthetics in fashion: modeling the perception of fashionability, Edgar Simo-Serra, Sanja Fi ...
- 论文笔记:Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language Association
Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language ...
- 论文笔记:Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention 2018-08-10 10:15:06 Pap ...
- 【CV】ICCV2015_Unsupervised Visual Representation Learning by Context Prediction
Unsupervised Visual Representation Learning by Context Prediction Note here: it's a learning note on ...
随机推荐
- iOS文本文件的编码检测
windows上很多文本未必是用UTF8,所以在iOS上读取的时候,如何得到文件的编码是个问题.网上有很多读取中文的例子,但是那些不够通用.比如说要读取日文,韩文,阿拉伯文等等的时候,就不行了(虽然一 ...
- ORM一对多查询
现有两张表,一张书籍表(Book),一张作者表(Author) 现在想查询出书本信息和书本的作者 book=Book.objects.get(name="python") book ...
- jar下载地址
java开发难免需要下载额外的jar,推荐一个地址 http://www.java2s.com/Code/Jar/CatalogJar.htm
- Mybatis 配置文件
1.核心配置文件 sqlMapConfig.xml <?xml version="1.0" encoding="UTF-8" ?> <!DOC ...
- 快速提高谷歌浏览器(Chrome)自带下载器的网速
之前每次下载东西都是复制好下载链接到迅雷中下载,会提高成倍网速,但是时间一长,感觉不方便,废话不多说,上干货~ 由于中国防火墙(GFW)的强大,在线下载Google浏览器的时候速度非常慢,如果只是单独 ...
- Linux系统查找清理磁盘大文件
本文主要介绍Linux系统磁盘使用空间不足时,如何查找大文件并进行清理的方法. 使用df-h检查一台服务器磁盘使用空间,发现磁盘已经使用了100%,其中/dev/mapper/vg_iavp-lv_r ...
- Java动态代理代码快速上手
动态代理的两个核心的点是:代理的行为 和 代理机构. 举个例子,上大学的时候,很多同学吃午饭的时候都是叫别人带饭,有一个人H特别热心肠,想了一个办法,他在门口挂了个公示牌,每天有谁想要找人带饭就写公告 ...
- 20155204 实验3《敏捷开发与XP实践》实验报告
20155204 实验3<敏捷开发与XP实践>实验报告 一.实验内容与步骤 1.研究IDEA的code菜单. 老师给的任务的是把一串代码格式化,这个任务很简单.code菜单主要是关于编辑代 ...
- 20155209实验二《Java面向对象程序设计》
20155209实验二<Java面向对象程序设计> 实验内容 初步掌握单元测试和TDD 理解并掌握面向对象三要素:封装.继承.多态 初步掌握UML建模 熟悉S.O.L.I.D原则 了解设计 ...
- # 20155224 2016-2017-2 《Java程序设计》第10周学习总结
20155224 2016-2017-2 <Java程序设计>第10周学习总结 教材学习内容总结 密码学: 主要是研究保密通信和信息保密的学科, 包括信息保密传输和信息加密存储等. 密码学 ...