Paper Reading - Show and Tell: A Neural Image Caption Generator ( CVPR 2015 )
Link of the Paper: https://arxiv.org/abs/1411.4555
Main Points:
- A generative model ( NIC, GoogLeNet + LSTM ) based on a deep recurrent architecture: the model is trained to maximize the likelihoodP(S|I) of the target description sentence given the training image I. S = { S1, S2, ... } is the target sequence of words and each word St comes from a given dictionary, that describes the image adequately.
- The authors use a CNN as an image "encoder", by first pre-training it for an image classification task and using the last hidden layer as an input to the RNN decoder that generates sentences. They call this model the Neural Image Caption, or NIC.
Other Key Points:
- A description must capture not only the objects contained in an image, but it also must express how these objects relate to each other as well as their attributes and the activities they are involved in.
- The inspiration of Image Captioning could come from advances in Machine Translation.
- There are multiple approaches that can be used to generate a sentence given an image, with NIC. The first one is Sampling where the authors just sample the first word according to p1, then provide the corresponding embedding as input and sample p2, continuing like this until we sample the special end-of-sentence token or some maximum length. The second one is Beamsearch: iteratively consider the set of the k best sentences up to time t as candidates to generate sentences of size t+1, and keep only the resulting best k of them. This better approximates S = arg maxS' p(S'|I).
Paper Reading - Show and Tell: A Neural Image Caption Generator ( CVPR 2015 )的更多相关文章
- [Paper Reading] Show and Tell: A Neural Image Caption Generator
论文链接:https://arxiv.org/pdf/1411.4555.pdf 代码链接:https://github.com/karpathy/neuraltalk & https://g ...
- Paper Reading - Show, Attend and Tell: Neural Image Caption Generation with Visual Attention ( ICML 2015 )
Link of the Paper: https://arxiv.org/pdf/1502.03044.pdf Main Points: Encoder-Decoder Framework: Enco ...
- [Paper Reading] Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
论文链接:https://arxiv.org/pdf/1502.03044.pdf 代码链接:https://github.com/kelvinxu/arctic-captions & htt ...
- Paper Reading - Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation ( CVPR 2015 )
Link of the Paper: https://ieeexplore.ieee.org/document/7298856/ A Correlative Paper: Learning a Rec ...
- [Paper Reading] Image Captioning using Deep Neural Architectures (arXiv: 1801.05568v1)
Main Contributions: A brief introduction about two different methods (retrieval based method and gen ...
- Paper Reading - Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge
Link of the Paper: https://arxiv.org/abs/1609.06647 A Correlative Paper: Show and Tell: A Neural Ima ...
- 论文:Show and Tell: A Neural Image Caption Generator-阅读总结
Show and Tell: A Neural Image Caption Generator-阅读总结 笔记不能简单的抄写文中的内容,得有自己的思考和理解. 一.基本信息 标题 作者 作者单位 发表 ...
- Paper Reading: Stereo DSO
开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...
- CVPR 2016 paper reading (6)
1. Neuroaesthetics in fashion: modeling the perception of fashionability, Edgar Simo-Serra, Sanja Fi ...
随机推荐
- 使用XWAF框架(1)——Web项目的代码分层
建议在Eclipse环境下使用XWAF框架来开发用户的Web项目,并遵循以下步骤和约定. 1.获取XWAF框架压缩包文件 程序员点击下列地址免费下载XWAF框架的压缩包文件:XWAF框架压缩文件 2. ...
- 3.高并发教程-基础篇-之分布式全文搜索引擎elasticsearch的搭建
高并发教程-基础篇-之分布式全文搜索引擎elasticsearch的搭建 如果大家看了我的上一篇<2.高并发教程-基础篇-之nginx+mysql实现负载均衡和读写分离>文章,如果能很好的 ...
- selenium java maven 自动化测试(一) helloworld
本教程使用selenium-java,简单的完成了网页访问 网页内容获取,表单填写以及按钮点击. 1. 使用maven构建项目 在pom中添加如下依赖: <dependency> < ...
- C++程序设计入门(上) string类的基本用法
string类中的函数 1. 构造 2. 追加 3. 赋值 4. 位置与清除 5. 长度与容量 6. 比较 7. 子串 8. 搜索 9. 运算符 追加字符串 string s1("Welc ...
- MYSQL 5.7.25最后一个5.x版本记录
一:下载 位 https://dev.mysql.com/get/Downloads/MySQL-5.7/mysql-5.7.25-win32.zip 位 https://dev.mysql.co ...
- Angular4 自制分页控件
过年后第一波,自制的分页控件,可能功能没有 PrimeNG 那么好,但是基本可以实现自定义翻页功能,包括:首页/最后一页/上一页/下一页. 用户可以自定义: 1. 当前默认页码(如未提供,默认为第一页 ...
- [译]C语言实现一个简易的Hash table(5)
上一章中,我们使用了双重Hash的技术来处理碰撞,并用了C语言实现,贲张我们将实现Hash表中的插入.搜索和删除接口. 实现接口 我们的hash函数将会实现如下的接口: // hash_table.h ...
- 004---IO模型
io模型 同步.异步.阻塞.非阻塞概念 同步:发出一个功能调用时,在没有得到结果之前,该调用就不会返回,原地等待 异步:相反,不需要等待 阻塞:调用结果返回之前,当前线程会被挂起,如io操作,只有在得 ...
- 20155212 ch02 课下作业
20155212 ch02 课下作业 T1 题目 参考附图代码,编写一个程序 "week0601学号.c",判断一下你的电脑是大端还是小端 相关知识 小端法:最低有效字节在最前面 ...
- 2016-2017-2 20155302 实验二《Java面向对象程序设计》实验报告
2016-2017-2 20155302 实验二<Java面向对象程序设计>实验报告 实验内容 1.初步掌握单元测试和TDD 2.理解并掌握面向对象三要素:封装.继承.多态 3.初步掌握U ...