Paper Reading - Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation ( CVPR 2015 )
Link of the Paper: https://ieeexplore.ieee.org/document/7298856/
A Correlative Paper: Learning a Recurrent Visual Representation for Image Caption Generation (Link of the Paper: https://arxiv.org/abs/1411.5654)
Main Points:
- A bi-directional mapping model using recurrent neural networks: unlike previous approaches which map both sentences and images to a common embedding ( and then calculate the similarity and match / generate, I guess ) that may be used for image search or for ranking image captions.
 - A bi-directional representation: generates both novel descriptions from images and visual representations from descriptions.
 - A novel recurrent visual memory: automatically learns to remember long-term visual concepts.
 - A set of latent variables Ut-1 that encodes the visual interpretation of the previously generated or read words Wt-1. Using U, our goal is to compute P(wt | V, Wt-1, Ut-1) and P(V | Wt-1, Ut-1). Combining these two likelihoods together our global objective is to maximize, P(wt, V | Wt-1, Ut-1) = P(wt | V, Wt-1, Ut-1)P(V | Wt-1, Ut-1). That is, we want to maximize the likelihood of the word wt and the observed visual features V given the previous words and their visual interpretation. Note that in previous papers, the objective was only to compute P(wt | V, Wt-1) and not P(V | Wt-1).
 

Other Key Points:
- Previous approaches project both semantics and visual features to a common embedding, they are not able to perform the inverse projection. That is, they cannot generate novel sentences or visual depictions from the embedding.
 
Paper Reading - Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation ( CVPR 2015 )的更多相关文章
- Paper Reading - Deep Captioning with Multimodal Recurrent Neural Networks ( m-RNN ) ( ICLR 2015 )   ★
		
Link of the Paper: https://arxiv.org/pdf/1412.6632.pdf Main Points: The authors propose a multimodal ...
 - Paper Reading - Show and Tell: A Neural Image Caption Generator ( CVPR 2015 )
		
Link of the Paper: https://arxiv.org/abs/1411.4555 Main Points: A generative model ( NIC, GoogLeNet ...
 - Paper Reading: Stereo DSO
		
开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...
 - Paper Reading - CNN+CNN: Convolutional Decoders for Image Captioning
		
Link of the Paper: https://arxiv.org/abs/1805.09019 Innovations: The authors propose a CNN + CNN fra ...
 - Paper Reading: In Defense of the Triplet Loss for Person Re-Identification
		
In Defense of the Triplet Loss for Person Re-Identification 2017-07-02 14:04:20 This blog comes ...
 - CVPR 2016 paper reading (6)
		
1. Neuroaesthetics in fashion: modeling the perception of fashionability, Edgar Simo-Serra, Sanja Fi ...
 - 论文笔记:Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language Association
		
Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language ...
 - 论文笔记:Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
		
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention 2018-08-10 10:15:06 Pap ...
 - 【CV】ICCV2015_Unsupervised Visual Representation Learning by Context Prediction
		
Unsupervised Visual Representation Learning by Context Prediction Note here: it's a learning note on ...
 
随机推荐
- iOS:Masonry约束经验(19-03-21更)
			
1.label约束: 1).只需约束x.y 点相关就行.宽高 长度相关不用约束,就算用boundingRectWithSize计算出来的,也可能不准. 如:top.bottom二选一,trailing ...
 - iOS 地图相关
			
参考博文:https://blog.csdn.net/zhengang007/article/details/52858198?utm_source=blogxgwz7 1.坐标系 目前常见的坐标系有 ...
 - JS中判断字符串中出现次数最多的字符及出现的次数
			
<script type="text/javascript"> var str = 'qwertyuilo.,mnbvcsarrrrrrrrtyuiop;l,mhgfd ...
 - Java实例 Part5:面向对象入门
			
目录 Part5:面向对象入门 Example01:成员变量的初始化值 Example02:单例模式的应用 -----懒汉式 -----饿汉式 Example03:汉诺塔问题的求解 Example04 ...
 - 搭建最小linux系统
			
Busybox简介 • 制作文件系统我们需要使用到Busybox 工具 – 版本为busybox-1.21.1.tar.bz2 – 开源网址是http://www.busybox.net/ – Bus ...
 - golang 后台服务设计精要
			
原文地址 守护进程 传统的后台服务一般作为守护进程(daemon)运行.linux 上创建 daemon 的步骤一般如下: 创建子进程,父进程退出: 调用系统调用 setsid() 脱离控制终端: 调 ...
 - go 网络请求篇二
			
框架地址:https://github.com/parnurzeal/gorequest package main //https://antarx.com/2018/05/05/gorequest- ...
 - vs2013发布网站合并程序是出错(ILmerge.merge:error)
			
Vs2013发布网站时,生成错误提示: 合并程序集时出错: ILMerge.Merge: ERROR!!: Duplicate type 'manage_ForcePasswrod' found in ...
 - 20155209 实验三 敏捷开发与XP实践
			
20155209 实验三 敏捷开发与XP实践 实验内容 1. XP基础 2. XP核心实践 3. 相关工具 提交点一: 在IDEA中使用工具(Code->Reformate Code)把下面代码 ...
 - 20155230 实验三《敏捷开发与XP实践》实验报告
			
20155230 实验三<敏捷开发与XP实践>实验报告 一.使用工具(Code->Reformate Code)把代码重新格式化 IDEA里的Code菜单有很多实用的功能可以帮助我们 ...