Paper Reading - Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation ( CVPR 2015 )
Link of the Paper: https://ieeexplore.ieee.org/document/7298856/
A Correlative Paper: Learning a Recurrent Visual Representation for Image Caption Generation (Link of the Paper: https://arxiv.org/abs/1411.5654)
Main Points:
- A bi-directional mapping model using recurrent neural networks: unlike previous approaches which map both sentences and images to a common embedding ( and then calculate the similarity and match / generate, I guess ) that may be used for image search or for ranking image captions.
- A bi-directional representation: generates both novel descriptions from images and visual representations from descriptions.
- A novel recurrent visual memory: automatically learns to remember long-term visual concepts.
- A set of latent variables Ut-1 that encodes the visual interpretation of the previously generated or read words Wt-1. Using U, our goal is to compute P(wt | V, Wt-1, Ut-1) and P(V | Wt-1, Ut-1). Combining these two likelihoods together our global objective is to maximize, P(wt, V | Wt-1, Ut-1) = P(wt | V, Wt-1, Ut-1)P(V | Wt-1, Ut-1). That is, we want to maximize the likelihood of the word wt and the observed visual features V given the previous words and their visual interpretation. Note that in previous papers, the objective was only to compute P(wt | V, Wt-1) and not P(V | Wt-1).

Other Key Points:
- Previous approaches project both semantics and visual features to a common embedding, they are not able to perform the inverse projection. That is, they cannot generate novel sentences or visual depictions from the embedding.
Paper Reading - Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation ( CVPR 2015 )的更多相关文章
- Paper Reading - Deep Captioning with Multimodal Recurrent Neural Networks ( m-RNN ) ( ICLR 2015 ) ★
Link of the Paper: https://arxiv.org/pdf/1412.6632.pdf Main Points: The authors propose a multimodal ...
- Paper Reading - Show and Tell: A Neural Image Caption Generator ( CVPR 2015 )
Link of the Paper: https://arxiv.org/abs/1411.4555 Main Points: A generative model ( NIC, GoogLeNet ...
- Paper Reading: Stereo DSO
开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...
- Paper Reading - CNN+CNN: Convolutional Decoders for Image Captioning
Link of the Paper: https://arxiv.org/abs/1805.09019 Innovations: The authors propose a CNN + CNN fra ...
- Paper Reading: In Defense of the Triplet Loss for Person Re-Identification
In Defense of the Triplet Loss for Person Re-Identification 2017-07-02 14:04:20 This blog comes ...
- CVPR 2016 paper reading (6)
1. Neuroaesthetics in fashion: modeling the perception of fashionability, Edgar Simo-Serra, Sanja Fi ...
- 论文笔记:Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language Association
Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language ...
- 论文笔记:Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention 2018-08-10 10:15:06 Pap ...
- 【CV】ICCV2015_Unsupervised Visual Representation Learning by Context Prediction
Unsupervised Visual Representation Learning by Context Prediction Note here: it's a learning note on ...
随机推荐
- LeetCode34.在排序数组中查找元素的第一个和最后一个位置 JavaScript
给定一个按照升序排列的整数数组 nums,和一个目标值 target.找出给定目标值在数组中的开始位置和结束位置. 你的算法时间复杂度必须是 O(log n) 级别. 如果数组中不存在目标值,返回 [ ...
- GPUImage源码解读之GLProgram
简述 GLProgram是GPUImage中代表openGL ES 中的program,具有glprogram功能.其实是作者对OpenGL ES program的面向对象封装 初始化 - (id)i ...
- Java中的clone方法-理解浅拷贝和深拷贝
最近学到Java虚拟机的相关知识,更加能理解clone方法的机制了 java中的我们常常需要复制的类型有三种: 1:8种基本类型,如int,long,float等: 2:复合数据类型(数组): 3:对 ...
- 歌词解析&class
class song_song: def __init__(self,lrc_file): # 定义两个字典一个列表备用 self.song_file = lrc_file self.song_lrc ...
- S/4 HANA中发票输出切换回NAST
在S/4 HANA中,新的输出管理Output Management叫做SAP S/4HANA output control(输出控制),是基于BRF+的,而不是原来基于NAST的.关于S4新的输出控 ...
- es6 入坑笔记(三)---数组,对象扩展
数组扩展 循环 arr.foreach(){ //回调函数 function(val,index,arr){ //val:当前读取到的数组的值,index:当前读取道德数组的索引,arr:当前的数组名 ...
- MongoDB模糊查询 工具
{"Exception":{$regex:"定时发送邮件"}} //模糊查询条件 {"DateTime":-1} // ...
- django 登录注册注销
一.设计数据模型 1.数据库模型设计 作为一个用户登录和注册项目,需要保存的都是各种用户的相关信息.很显然,我们至少需要一张用户表User,在用户表里需要保存下面的信息: 用户名 密码 邮箱地址 性别 ...
- 一张图一个表——CSS选择器总结
CSS选择器总结: (这些表是一张图片^_^) 看底部 完整思维导图图片和表格的下载地址:https://download.csdn.net/download/denlnyyr/10597820 (我 ...
- phporjquery生成二维码
一.php生成二维码 下载文章末尾链接中phpcode文件 include "./phpqrcode/qrlib.php"; //QRcode::png('http://www.b ...