Paper Reading - Show and Tell: A Neural Image Caption Generator ( CVPR 2015 )
Link of the Paper: https://arxiv.org/abs/1411.4555
Main Points:
- A generative model ( NIC, GoogLeNet + LSTM ) based on a deep recurrent architecture: the model is trained to maximize the likelihoodP(S|I) of the target description sentence given the training image I. S = { S1, S2, ... } is the target sequence of words and each word St comes from a given dictionary, that describes the image adequately.
- The authors use a CNN as an image "encoder", by first pre-training it for an image classification task and using the last hidden layer as an input to the RNN decoder that generates sentences. They call this model the Neural Image Caption, or NIC.

Other Key Points:
- A description must capture not only the objects contained in an image, but it also must express how these objects relate to each other as well as their attributes and the activities they are involved in.
- The inspiration of Image Captioning could come from advances in Machine Translation.
- There are multiple approaches that can be used to generate a sentence given an image, with NIC. The first one is Sampling where the authors just sample the first word according to p1, then provide the corresponding embedding as input and sample p2, continuing like this until we sample the special end-of-sentence token or some maximum length. The second one is Beamsearch: iteratively consider the set of the k best sentences up to time t as candidates to generate sentences of size t+1, and keep only the resulting best k of them. This better approximates S = arg maxS' p(S'|I).
Paper Reading - Show and Tell: A Neural Image Caption Generator ( CVPR 2015 )的更多相关文章
- [Paper Reading] Show and Tell: A Neural Image Caption Generator
论文链接:https://arxiv.org/pdf/1411.4555.pdf 代码链接:https://github.com/karpathy/neuraltalk & https://g ...
- Paper Reading - Show, Attend and Tell: Neural Image Caption Generation with Visual Attention ( ICML 2015 )
Link of the Paper: https://arxiv.org/pdf/1502.03044.pdf Main Points: Encoder-Decoder Framework: Enco ...
- [Paper Reading] Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
论文链接:https://arxiv.org/pdf/1502.03044.pdf 代码链接:https://github.com/kelvinxu/arctic-captions & htt ...
- Paper Reading - Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation ( CVPR 2015 )
Link of the Paper: https://ieeexplore.ieee.org/document/7298856/ A Correlative Paper: Learning a Rec ...
- [Paper Reading] Image Captioning using Deep Neural Architectures (arXiv: 1801.05568v1)
Main Contributions: A brief introduction about two different methods (retrieval based method and gen ...
- Paper Reading - Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge
Link of the Paper: https://arxiv.org/abs/1609.06647 A Correlative Paper: Show and Tell: A Neural Ima ...
- 论文:Show and Tell: A Neural Image Caption Generator-阅读总结
Show and Tell: A Neural Image Caption Generator-阅读总结 笔记不能简单的抄写文中的内容,得有自己的思考和理解. 一.基本信息 标题 作者 作者单位 发表 ...
- Paper Reading: Stereo DSO
开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...
- CVPR 2016 paper reading (6)
1. Neuroaesthetics in fashion: modeling the perception of fashionability, Edgar Simo-Serra, Sanja Fi ...
随机推荐
- linux系统安装redis服务器与php redis扩展
一 安装redis服务 1更新yum源 CentOS/RHEL 7.x: rpm -Uvh https://dl.fedoraproject.org/pub/epel/epel-release-lat ...
- execute immediate
首先在这里发发牢骚,指责下那些刻板的书写方式,不考虑读者理不理解,感觉就是给专业人员用来复习用的一样,没有前戏,直接就高潮,实在受不了!没基础或基础差的完全不知道发生了什么,一脸懵逼的看着,一星差评! ...
- C#设计模式 —— 单例模式
嗯,这是本人的第一篇随笔,就从最简单的单例模式开始,一步一步地记录自己的成长. 单例模式是最常见的设计模式之一,在项目代码中几乎随处可见.这个设计模式的目的就是为了保证实例只能存在一个.单例模式往下还 ...
- 前端基础-BOM和DOM的介绍
阅读目录 BOM和DOM的简述 DOM详细操作 事件 一.BOM和DOM的简述 到目前为止,我们已经学过了JavaScript的一些简单的语法.但是这些简单的语法,并没有和浏览器有任何交互. 也就是我 ...
- 【PTA 天梯赛】L1-046 整除光棍(除法模拟)
这里所谓的“光棍”,并不是指单身汪啦~ 说的是全部由1组成的数字,比如1.11.111.1111等.传说任何一个光棍都能被一个不以5结尾的奇数整除.比如,111111就可以被13整除. 现在,你的程序 ...
- 【ZOJ 2996】(1+x)^n(二项式定理)
Please calculate the coefficient modulo 2 of x^i in (1+x)^n. Input For each case, there are two inte ...
- JavaWeb日常笔记
1. XML文档的作用和解析 1. XML的基本概述: XML的主要是用来存储一对多的数据,另外还可以用来当做配置文件存储数据.XML的表头如下: <?xml version='1.0' e ...
- jQuery之修改li下样式和图片
<script type="text/javascript"> $(document).ready(function(){ $('li').click(function ...
- 杂项(乌班图、flex的使用实例)
查看乌班图当前系统版本:lsb_release -a 转载于博客园:flex的使用实例
- 基于OMAPL138的Linux字符驱动_GPIO驱动AD9833(一)之miscdevice和ioctl
基于OMAPL138的Linux字符驱动_GPIO驱动AD9833(一)之miscdevice和ioctl 0. 导语 在嵌入式的道路上寻寻觅觅很久,进入嵌入式这个行业也有几年的时间了,从2011年后 ...