Paper Reading - Show and Tell: A Neural Image Caption Generator ( CVPR 2015 )
Link of the Paper: https://arxiv.org/abs/1411.4555
Main Points:
- A generative model ( NIC, GoogLeNet + LSTM ) based on a deep recurrent architecture: the model is trained to maximize the likelihoodP(S|I) of the target description sentence given the training image I. S = { S1, S2, ... } is the target sequence of words and each word St comes from a given dictionary, that describes the image adequately.
- The authors use a CNN as an image "encoder", by first pre-training it for an image classification task and using the last hidden layer as an input to the RNN decoder that generates sentences. They call this model the Neural Image Caption, or NIC.

Other Key Points:
- A description must capture not only the objects contained in an image, but it also must express how these objects relate to each other as well as their attributes and the activities they are involved in.
- The inspiration of Image Captioning could come from advances in Machine Translation.
- There are multiple approaches that can be used to generate a sentence given an image, with NIC. The first one is Sampling where the authors just sample the first word according to p1, then provide the corresponding embedding as input and sample p2, continuing like this until we sample the special end-of-sentence token or some maximum length. The second one is Beamsearch: iteratively consider the set of the k best sentences up to time t as candidates to generate sentences of size t+1, and keep only the resulting best k of them. This better approximates S = arg maxS' p(S'|I).
Paper Reading - Show and Tell: A Neural Image Caption Generator ( CVPR 2015 )的更多相关文章
- [Paper Reading] Show and Tell: A Neural Image Caption Generator
论文链接:https://arxiv.org/pdf/1411.4555.pdf 代码链接:https://github.com/karpathy/neuraltalk & https://g ...
- Paper Reading - Show, Attend and Tell: Neural Image Caption Generation with Visual Attention ( ICML 2015 )
Link of the Paper: https://arxiv.org/pdf/1502.03044.pdf Main Points: Encoder-Decoder Framework: Enco ...
- [Paper Reading] Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
论文链接:https://arxiv.org/pdf/1502.03044.pdf 代码链接:https://github.com/kelvinxu/arctic-captions & htt ...
- Paper Reading - Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation ( CVPR 2015 )
Link of the Paper: https://ieeexplore.ieee.org/document/7298856/ A Correlative Paper: Learning a Rec ...
- [Paper Reading] Image Captioning using Deep Neural Architectures (arXiv: 1801.05568v1)
Main Contributions: A brief introduction about two different methods (retrieval based method and gen ...
- Paper Reading - Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge
Link of the Paper: https://arxiv.org/abs/1609.06647 A Correlative Paper: Show and Tell: A Neural Ima ...
- 论文:Show and Tell: A Neural Image Caption Generator-阅读总结
Show and Tell: A Neural Image Caption Generator-阅读总结 笔记不能简单的抄写文中的内容,得有自己的思考和理解. 一.基本信息 标题 作者 作者单位 发表 ...
- Paper Reading: Stereo DSO
开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...
- CVPR 2016 paper reading (6)
1. Neuroaesthetics in fashion: modeling the perception of fashionability, Edgar Simo-Serra, Sanja Fi ...
随机推荐
- 【腾讯敏捷转型No.3】Scrum有什么好?
在敏捷转型的过程中,除了敏捷宣言中的四个价值观和十二条原则以外,并没有太多比较权威的理论实践. 如图一,敏捷宣言中的四个价值观: (图一) 四条敏捷核心价值观指出了敏捷的核心思想,但是并没有仔细说明具 ...
- No MyBatis mapper was found in '[com.wuji.springboot]' package. Please check your configuration
No MyBatis mapper was found in '[com.wuji.springboot]' package. Please check your configuration. 这个原 ...
- vue数组赋值
在使用vue开发移动端项目过程中,统一数组在对多个变量赋值时:希望一个数组的改变不影响另外一个数组,此时可以使用如下方式实现: let arr = [] let a1 = JSON.parse(JSO ...
- iview中tree的事件运用
iview中的事件和方法如下: 案例说明: html代码 <Tree :data="data4" @on-check-change="choiceAll" ...
- Sql主从同步服务
主服务器A:192.168.1.102从服务器B:192.168.1.103 先关掉主服务器phpstudy,把数据库备份到从服务器 1.授权用户:在A服务器新建一个从账号锁定IP GRANT REP ...
- 利用RTTI实现Delphi的多播事件代理研究
我们知道Delphi的每个对象可以包含多个Property,Property中可以是方法,例如TButton.OnClick属性.Delphi提供的仅仅是 一对一的设置,无法直接让TButton.On ...
- Hadoop HA 高可用集群搭建
一.首先配置集群信息 vi /etc/hosts 二.安装zookeeper 1.解压至/usr/hadoop/下 .tar.gz -C /usr/hadoop/ 2.进入/usr/hadoop/zo ...
- iOS开发者证书-详解/生成/使用
本文假设你已经有一些基本的Xcode开发经验, 并注册了iOS开发者账号. 相关基础 加密算法 现代密码学中, 主要有两种加密算法: 对称密钥加密 和 公开密钥加密. 对称密钥加密 对称密钥加密(Sy ...
- 快速认识LinkIt 7697开发板
LinkIt 7697是一款多功能且价格亲民的开发板,可用来连接网络或你的各项装置,同时提供Wi-Fi及蓝芽两种联机功能.此开发板采用MediaTek MT7697芯片,比起其他类似的Wi-Fi/蓝芽 ...
- c语言异常处理机制
异常处理机制:setjmp()函数与longjmp()函数 C标准库提供两个特殊的函数:setjmp() 及 longjmp(),这两个函数是结构化异常的基础,正是利用这两个函数的特性来实现异常. 所 ...