Paper Reading - Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation ( CVPR 2015 )
Link of the Paper: https://ieeexplore.ieee.org/document/7298856/
A Correlative Paper: Learning a Recurrent Visual Representation for Image Caption Generation (Link of the Paper: https://arxiv.org/abs/1411.5654)
Main Points:
- A bi-directional mapping model using recurrent neural networks: unlike previous approaches which map both sentences and images to a common embedding ( and then calculate the similarity and match / generate, I guess ) that may be used for image search or for ranking image captions.
- A bi-directional representation: generates both novel descriptions from images and visual representations from descriptions.
- A novel recurrent visual memory: automatically learns to remember long-term visual concepts.
- A set of latent variables Ut-1 that encodes the visual interpretation of the previously generated or read words Wt-1. Using U, our goal is to compute P(wt | V, Wt-1, Ut-1) and P(V | Wt-1, Ut-1). Combining these two likelihoods together our global objective is to maximize, P(wt, V | Wt-1, Ut-1) = P(wt | V, Wt-1, Ut-1)P(V | Wt-1, Ut-1). That is, we want to maximize the likelihood of the word wt and the observed visual features V given the previous words and their visual interpretation. Note that in previous papers, the objective was only to compute P(wt | V, Wt-1) and not P(V | Wt-1).

Other Key Points:
- Previous approaches project both semantics and visual features to a common embedding, they are not able to perform the inverse projection. That is, they cannot generate novel sentences or visual depictions from the embedding.
Paper Reading - Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation ( CVPR 2015 )的更多相关文章
- Paper Reading - Deep Captioning with Multimodal Recurrent Neural Networks ( m-RNN ) ( ICLR 2015 ) ★
Link of the Paper: https://arxiv.org/pdf/1412.6632.pdf Main Points: The authors propose a multimodal ...
- Paper Reading - Show and Tell: A Neural Image Caption Generator ( CVPR 2015 )
Link of the Paper: https://arxiv.org/abs/1411.4555 Main Points: A generative model ( NIC, GoogLeNet ...
- Paper Reading: Stereo DSO
开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...
- Paper Reading - CNN+CNN: Convolutional Decoders for Image Captioning
Link of the Paper: https://arxiv.org/abs/1805.09019 Innovations: The authors propose a CNN + CNN fra ...
- Paper Reading: In Defense of the Triplet Loss for Person Re-Identification
In Defense of the Triplet Loss for Person Re-Identification 2017-07-02 14:04:20 This blog comes ...
- CVPR 2016 paper reading (6)
1. Neuroaesthetics in fashion: modeling the perception of fashionability, Edgar Simo-Serra, Sanja Fi ...
- 论文笔记:Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language Association
Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language ...
- 论文笔记:Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention 2018-08-10 10:15:06 Pap ...
- 【CV】ICCV2015_Unsupervised Visual Representation Learning by Context Prediction
Unsupervised Visual Representation Learning by Context Prediction Note here: it's a learning note on ...
随机推荐
- GoBelieve service部署常见问题总结
问题1: 大家好,我按照文档的步骤编译im时(make install),出现 can't load package: package main: app_route.go:1:1: expected ...
- 深入理解Java虚拟机(一) 运行时数据区划分
前言:从我学Java的第一天开始,我的大学老师就告诉我 Java语言相比C.C++的语言有一个非常强大的功能,那就是自动内存管理:我们用Java编码时不需要申请或释放内存等,这些工作全部交由我们的Ja ...
- Notes 20180506 : Java程序设计语言概述
2.Java程序设计语言概述 如果对于开发语言的排行榜有所关注的话,那么会发现很长一段时间以来Java都是位居榜首的高级开发语言,作为一个Java开发者,为此感到骄傲的同时也深感忧虑,骄傲的是自己接触 ...
- 在vscode中使用webpack中安装的echarts文件失败,dom获取class名,图表不显示
所有的东西都是新学的,所以遇到了很多问题: (1)首先,在电脑上已经安装了node的情况下, 在npm中安装echarts:npm install echarts --save mac系统在最前面加上 ...
- iOS通过切片仿断点机制上传文件
项目开发中,有时候我们需要将本地的文件上传到服务器,简单的几张图片还好,但是针对iPhone里面的视频文件进行上传,为了用户体验,我们有必要实现断点上传.其实也不是真的断点,这里我们只是模仿断点机制. ...
- Rabbitmq(三)
1.在服务器安装好rabbitmq后,自己配置自己用的vhost,exchange和queue的绑定 2.项目添加RabbitMqClient.dll(nuget获取)引用 3.添加helper就可以 ...
- DELPHI一个对付内存汇漏的办法和技巧
DELPHI是要手动释放内存的,如果客户端程序有泄漏,可能不是很大问题, 但是如果你是用DELPHI做服务端程序,有泄漏的话,时间一长会占用很多内存,直到服务端程序要关闭重启.所以内存泄漏还是有害的. ...
- Python学习 :面向对象(一)
面向对象 一.定义 面向对象:面向对象为类和对象之间的应用 class + 类名: #在类中的函数称作 “方法“ def + 方法名(self,arg): #方法中第一个参数必须是 self prin ...
- 09-if判断语句
https://www.liaoxuefeng.com/ 廖雪峰的官方网站 hm_01_判断年龄.py 1 #!/usr/bin/env python3 # coding=utf-8 def mai ...
- Go黑帽子
使用go语言来实现python黑帽子和绝技的代码 1.unix密码破解器 package main import( "bufio" "flag" "i ...