Link of the Paper: https://arxiv.org/pdf/1502.03044.pdf

Main Points:

  1. Encoder-Decoder Framework: Encoder uses a convolutional neural network to extract a set of feature vectors which the authors refer to as annotation vectors. The extractor produces L vectors, each of  which is a D-dimensional representation corresponding to a part of the image. a = { a1, ..., aL }, ai ∈ RD. In order to obtain a correspondence between the feature vectors and portions of the 2-D image, they extract features from a lower convolutional layer unlike previous work which instead used a fully connected layer. This allows the decoder to selectively focus on certain parts of an image by weighting a subset of all the feature vectors. Decoder uses a LSTM network to produce a caption by generating one word at every time step conditioned on a context vector, the previous hidden state and the previously generated words.
  2. Two attention-based image caption generators under a common framework: a "soft" deterministic attention mechanism trainable by standard back-propagation methods; and a "hard" stochastic attention mechanism trainable by maximizing an approximate variational lower bound or equivalently by Reinforce.

Other Key Points:

  1. Rather than compress an entire image into a static representation, attention allows for salient features to dynamically come to the forefront as needed. This is especially important when there is a lot of clutter in an image. Using representations ( such as those from the very top layer of a conv net ) that distill information in image down to the most salient objects is one effective solution that has been widely adopted in previous work. Unfortunately, this has one potential drawback of losing information which could be useful for richer, more descriptive captions. Using lower-level representation can help preserve this information.

Paper Reading - Show, Attend and Tell: Neural Image Caption Generation with Visual Attention ( ICML 2015 )的更多相关文章

  1. [Paper Reading] Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

    论文链接:https://arxiv.org/pdf/1502.03044.pdf 代码链接:https://github.com/kelvinxu/arctic-captions & htt ...

  2. 论文笔记:Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

    Show, Attend and Tell: Neural Image Caption Generation with Visual Attention 2018-08-10 10:15:06 Pap ...

  3. 论文:Show, Attend and Tell: Neural Image Caption Generation with Visual Attention-阅读总结

    Show, Attend and Tell: Neural Image Caption Generation with Visual Attention-阅读总结 笔记不能简单的抄写文中的内容,得有自 ...

  4. Paper Reading - Show and Tell: A Neural Image Caption Generator ( CVPR 2015 )

    Link of the Paper: https://arxiv.org/abs/1411.4555 Main Points: A generative model ( NIC, GoogLeNet ...

  5. [Paper Reading] Show and Tell: A Neural Image Caption Generator

    论文链接:https://arxiv.org/pdf/1411.4555.pdf 代码链接:https://github.com/karpathy/neuraltalk & https://g ...

  6. [Paper Reading] Image Captioning using Deep Neural Architectures (arXiv: 1801.05568v1)

    Main Contributions: A brief introduction about two different methods (retrieval based method and gen ...

  7. Paper Reading - CNN+CNN: Convolutional Decoders for Image Captioning

    Link of the Paper: https://arxiv.org/abs/1805.09019 Innovations: The authors propose a CNN + CNN fra ...

  8. Paper Reading: Stereo DSO

    开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...

  9. Paper Reading - Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation ( CVPR 2015 )

    Link of the Paper: https://ieeexplore.ieee.org/document/7298856/ A Correlative Paper: Learning a Rec ...

随机推荐

  1. 商业化IM 客户端接口设计分析

    对于刚接触IM(即时通讯)开发,通过阅读成熟的商业代码能够对即时通讯软件大体上有个认识,比如消息发送,消息接受,消息监听,群聊,单聊,聊天室.我这边直接拿[Gobelieve IM]源码来做剖析.IM ...

  2. DB数据源之SpringBoot+MyBatis踏坑过程(五)手动使用Hikari连接池

    DB数据源之SpringBoot+MyBatis踏坑过程(五)手动使用Hikari连接池 liuyuhang原创,未经允许禁止转载  系列目录连接 DB数据源之SpringBoot+Mybatis踏坑 ...

  3. window系统mysql无法输入和无法显示中文的处理配置

    第一步:使用记事本打开mysql安装目录下的"my.ini”文件. # MySQL client library initialization. [client] port= [mysql] ...

  4. 只查看xilong.txt[共100行]内第20行到第30行的内容

    1: Test Environment [root@xilong startimes]# seq > xilong.txt [root@xilong startimes]# cat xilong ...

  5. redis安装make失败,make[1]: *** [adlist.o] Error 127....

    解压后 执行make后报错: cd src && make all make[1]: Entering directory `/home/liuchaofan/redis-3.0.7/ ...

  6. Android中,子线程使用主线程中的组件出现问题的解决方法

    Android中,主线程中的组件,不能被子线程调用,否则就会出现异常. 这里所使用的方法就是利用Handler类中的Callback(),接受线程中的Message类发来的消息,然后把所要在线程中执行 ...

  7. python学习笔记(3)---cookie & session

    一.cookie & session 1.cookie: cookie 就是由服务器发送给客户端的特殊信息,而这些信息以文本的方式存放在客户端,然后客户端每次向服务器发送请求都会带上这些特殊信 ...

  8. day 20 约束 异常处理 MD5

    1.类的约束(重点): 写一个父类.  父类中的某个方法要抛出一个异常  NotImplementError # 项目经理 class Base:     # 对子类进行了约束. 必须重写该方法    ...

  9. 使用Scala开发Apache Kafka的TOP 20大好用实践

    本文作者是一位软件工程师,他对20位开发人员和数据科学家使用Apache Kafka的方式进行了最大限度得深入研究,最终将生产实践环节需要注意的问题总结为本文所列的20条建议. Apache Kafk ...

  10. Struts2+Datagrid表格显示(可显示多表内容)

    概述 最近学到EasyUI的Datagrid数据网格,然后就做了一个小例子,中间层利用Struts2来完成,DAO层用的是Hibernate. 数据库 数据库涉及到stuednt(name,noid, ...