Paper Reading - Show and Tell: A Neural Image Caption Generator ( CVPR 2015 )

Link of the Paper: https://arxiv.org/abs/1411.4555

Main Points:

A generative model ( NIC, GoogLeNet + LSTM ) based on a deep recurrent architecture: the model is trained to maximize the likelihoodP(S|I) of the target description sentence given the training image I. S = { S₁, S₂, ... } is the target sequence of words and each word S_t comes from a given dictionary, that describes the image adequately.
The authors use a CNN as an image "encoder", by first pre-training it for an image classification task and using the last hidden layer as an input to the RNN decoder that generates sentences. They call this model the Neural Image Caption, or NIC.

Other Key Points:

A description must capture not only the objects contained in an image, but it also must express how these objects relate to each other as well as their attributes and the activities they are involved in.
The inspiration of Image Captioning could come from advances in Machine Translation.
There are multiple approaches that can be used to generate a sentence given an image, with NIC. The first one is Sampling where the authors just sample the first word according to p₁, then provide the corresponding embedding as input and sample p₂, continuing like this until we sample the special end-of-sentence token or some maximum length. The second one is Beamsearch: iteratively consider the set of the k best sentences up to time t as candidates to generate sentences of size t+1, and keep only the resulting best k of them. This better approximates S = arg max_S'p(S'|I).

Paper Reading - Show and Tell: A Neural Image Caption Generator ( CVPR 2015 )的更多相关文章

[Paper Reading] Show and Tell: A Neural Image Caption Generator
论文链接:https://arxiv.org/pdf/1411.4555.pdf 代码链接:https://github.com/karpathy/neuraltalk & https://g ...
Paper Reading - Show, Attend and Tell: Neural Image Caption Generation with Visual Attention ( ICML 2015 )
Link of the Paper: https://arxiv.org/pdf/1502.03044.pdf Main Points: Encoder-Decoder Framework: Enco ...
[Paper Reading] Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
论文链接:https://arxiv.org/pdf/1502.03044.pdf 代码链接:https://github.com/kelvinxu/arctic-captions & htt ...
Paper Reading - Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation ( CVPR 2015 )
Link of the Paper: https://ieeexplore.ieee.org/document/7298856/ A Correlative Paper: Learning a Rec ...
[Paper Reading] Image Captioning using Deep Neural Architectures (arXiv: 1801.05568v1)
Main Contributions: A brief introduction about two different methods (retrieval based method and gen ...
Paper Reading - Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge
Link of the Paper: https://arxiv.org/abs/1609.06647 A Correlative Paper: Show and Tell: A Neural Ima ...
论文：Show and Tell: A Neural Image Caption Generator-阅读总结
Show and Tell: A Neural Image Caption Generator-阅读总结笔记不能简单的抄写文中的内容,得有自己的思考和理解. 一.基本信息标题作者作者单位发表 ...
Paper Reading: Stereo DSO
开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...
CVPR 2016 paper reading (6)
1. Neuroaesthetics in fashion: modeling the perception of fashionability, Edgar Simo-Serra, Sanja Fi ...

随机推荐

JAVA类加载和初始化
JAVA类的加载和初始化一.类的加载和初始化过程 JVM将类的加载分为3个步骤: 1.加载(Load):class文件创建Class对象. 2.链接(Link) 3.初始化(Initialize) ...
[Oracle]Oracle良性SQL建议
(1)选择最有效率的表名顺序(只在基于规则的优化器中有效): Oracle的解析器按照从右到左的顺序处理FROM子句中的表名,FROM子句中写在最后的表(基础表 driving table)将被最先处 ...
【2016 ICPC亚洲区域赛北京站 E】What a Ridiculous Election（BFS预处理）
Description In country Light Tower, a presidential election is going on. There are two candidates, ...
使用mock.js进行数据模拟
mock.js的文档真的是无力吐槽,只说明API怎么使用,完全不说明mock.js这个工具怎么用到项目里面,最有意思的是google的大部分文章复制官网的API,不管是react还是Vue都是下面的流 ...
课时54.audio标签（掌握）
1.什么是audio标签? 播放音频格式: <audio src=""> </audio> 也是由于同样的适配问题,所以出现了第二种格式 <audi ...
tp3.2和Bootstrap模态框导入excel表格数据
导入按钮 <button class="btn btn-info" type="button" id="import" data-to ...
html学习笔记--标签大全
一.HTML标记标签:!DOCTYPE 说明:指定了 HTML 文档遵循的文档类型定义(DTD). 标签:a 说明:标明超链接的起始或目的位置. 标签:acronym 说明:标明缩写词. ...
Python学习：14.Python面向对象（一）
一.面向对象简介 Python设计之初,就是一门面向对象的语言,在Python中一切皆对象,而且在Python中创建一个对象也很简单,今天我们就来学习一下Python的面向对象的知识. 二.两种编程方 ...
改脚本之dbscaner
默认的DBscaner只是用了ipy模块支持一个段的解析,但是我想让他加载脚本进行检测所以,直接看 def __init__(self, target, thread): self.target = ...
Go 学习之路：Println 与 Printf 的区别
Println 和Printf 都是fmt包中公共方法:在需要打印信息时常用的函数,那么二函数有什么区别呢? 附上代码 package main import ("time"&qu ...

Paper Reading - Show and Tell: A Neural Image Caption Generator ( CVPR 2015 )

Paper Reading - Show and Tell: A Neural Image Caption Generator ( CVPR 2015 )的更多相关文章

随机推荐

热门专题