Link of the Paper: https://arxiv.org/abs/1411.4555

Main Points:

  1. A generative model ( NIC, GoogLeNet + LSTM ) based on a deep recurrent architecture: the model is trained to maximize the likelihoodP(S|I) of the target description sentence given the training image I. S = { S1, S2, ... } is the target sequence of words and each word St comes from a given dictionary, that describes the image adequately.
  2. The authors use a CNN as an image "encoder", by first pre-training it for an image classification task and using the last hidden layer as an input to the RNN decoder that generates sentences. They call this model the Neural Image Caption, or NIC.

  

Other Key Points:

  1. A description must capture not only the objects contained in an image, but it also must express how these objects relate to each other as well as their attributes and the activities they are involved in.
  2. The inspiration of Image Captioning could come from advances in Machine Translation.
  3. There are multiple approaches that can be used to generate a sentence given an image, with NIC. The first one is Sampling where the authors just sample the first word according to p1, then provide the corresponding embedding as input and sample p2, continuing like this until we sample the special end-of-sentence token or some maximum length. The second one is Beamsearch: iteratively consider the set of the k best sentences up to time t as candidates to generate sentences of size t+1, and keep only the resulting best k of them. This better approximates S = arg maxS' p(S'|I).

Paper Reading - Show and Tell: A Neural Image Caption Generator ( CVPR 2015 )的更多相关文章

  1. [Paper Reading] Show and Tell: A Neural Image Caption Generator

    论文链接:https://arxiv.org/pdf/1411.4555.pdf 代码链接:https://github.com/karpathy/neuraltalk & https://g ...

  2. Paper Reading - Show, Attend and Tell: Neural Image Caption Generation with Visual Attention ( ICML 2015 )

    Link of the Paper: https://arxiv.org/pdf/1502.03044.pdf Main Points: Encoder-Decoder Framework: Enco ...

  3. [Paper Reading] Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

    论文链接:https://arxiv.org/pdf/1502.03044.pdf 代码链接:https://github.com/kelvinxu/arctic-captions & htt ...

  4. Paper Reading - Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation ( CVPR 2015 )

    Link of the Paper: https://ieeexplore.ieee.org/document/7298856/ A Correlative Paper: Learning a Rec ...

  5. [Paper Reading] Image Captioning using Deep Neural Architectures (arXiv: 1801.05568v1)

    Main Contributions: A brief introduction about two different methods (retrieval based method and gen ...

  6. Paper Reading - Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge

    Link of the Paper: https://arxiv.org/abs/1609.06647 A Correlative Paper: Show and Tell: A Neural Ima ...

  7. 论文:Show and Tell: A Neural Image Caption Generator-阅读总结

    Show and Tell: A Neural Image Caption Generator-阅读总结 笔记不能简单的抄写文中的内容,得有自己的思考和理解. 一.基本信息 标题 作者 作者单位 发表 ...

  8. Paper Reading: Stereo DSO

    开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...

  9. CVPR 2016 paper reading (6)

    1. Neuroaesthetics in fashion: modeling the perception of fashionability, Edgar Simo-Serra, Sanja Fi ...

随机推荐

  1. LeetCode31.下一个排列 JavaScript

    实现获取下一个排列的函数,算法需要将给定数字序列重新排列成字典序中下一个更大的排列. 如果不存在下一个更大的排列,则将数字重新排列成最小的排列(即升序排列). 必须原地修改,只允许使用额外常数空间. ...

  2. 数字转汉字|语言代码|NSNumberFormatter

    iOS之阿拉伯数字转中文数字 - 简书 iOS中金额数字的格式化 NSNumberFormatter - 简书 ISO语言代码(ISO-639)与国家代码(ISO-3166) - CSDN博客 语种名 ...

  3. Vue聊天框默认滚动到底部

    功能场景 在开发中,我们总能遇到某些场景需要运用到聊天框,比如客服对话.如果你不是一名开发人员,可能你在使用QQ或者聊天工具的时候并没有注意到,当你发出一条消息的时候,窗体会默认滚动到最底部,让用户可 ...

  4. Vue 源码分析——构造函数原型

    在执行 npm run dev 的时候 根据script/config.js 文件中的配置 'web-full-dev': { entry: resolve('web/entry-runtime-wi ...

  5. 『ACM C++』 PTA 天梯赛练习集L1 | 021-024

    忙疯警告,这两天可能进度很慢,下午打了一下午训练赛,训练赛的题我就不拿过来的,pta就做了一点点,明天又是满课的一天,所以进度很慢啦~ -------------------------------- ...

  6. CentOS7 minimal(最小化安装)后增加的软件安装

    1.net-tools 安装,因为习惯使用ifconfig命令 2.wget安装,下载工具必不可少 3.vim安装,相比于vi个人更喜欢vim 4.yum-plugin-priorities安装,用于 ...

  7. Java中的代码块:局部代码块、构造代码块和静态代码块

    代码块 java代码中用{ }括起来的代码段叫做代码块 1.局部代码块 在局部位置,用于限定变量的生命周期 例如,下面代码中的a仅在代码块中起作用,因此会编译报错 class Demo{ public ...

  8. mac安装配置mysql

    目录 mac安装配置mysql 1.mysql的安装 2.设置root用户的密码 3.分别执行一下命令 4.配置mysql环境变量 mac安装配置mysql 1.mysql的安装 ​ 安装过程十分简单 ...

  9. 给网页标签页添加logo

    先把logo转换成后缀名是ico的图片,然后在网页头部,也就是<head></head>中间放上<link rel="shortcut icon"ty ...

  10. MongoDB安装及启动

    本机环境系统:Debian 9桌面系统:KDE Plasma ## 官网下载自己系统最新稳定版 https://www.mongodb.com/download-center#community 选择 ...