Link of the Paper: https://arxiv.org/abs/1805.09019

Innovations:

  • The authors propose a CNN + CNN framework for image captioning. There are four modules in the framework: vision module ( VGG-16 ), which is adopted to "watch" images; language module, which is to model sentences; attention module, which connects the vision module with the language module; prediction module, which takes the visual features from the attention module and concepts from the language module as input and predicts the next word.

        

General Points:

  • RNNs or LSTMs cannot be calculated in parallel and ignore the underlying hierarchical structure of a sentence.
  • Directly feeding the output of the CNN into the RNN treats objects in an image the same and ignores the salient objects when generating one word.
  • In both m-RNN and NIC, an image is represented by a single vector, which ignores different areas and objects in the image. A spatial attention mechanism is introduced into image captioning model in Show, attend and tell: Neural image caption generation with visual attention, which allows the model to pay attention to different areas at each time step.

Paper Reading - CNN+CNN: Convolutional Decoders for Image Captioning的更多相关文章

  1. Paper Reading - Long-term Recurrent Convolutional Networks for Visual Recognition and Description ( CVPR 2015 )

    Link of the Paper: https://arxiv.org/abs/1411.4389 Main Points: A novel Recurrent Convolutional Arch ...

  2. 使用CNN(convolutional neural nets)关键的一点是检测到的面部教程(四):学习率,学习潜能,dropout

    第七部分 让 学习率 和 学习潜能 随时间的变化 光训练就花了一个小时的时间.等结果并非一个令人心情愉快的事情.这一部分.我们将讨论将两个技巧结合让网络训练的更快! 直觉上的解决的方法是,開始训练时取 ...

  3. Paper Reading: Stereo DSO

    开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...

  4. SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning

    题目:SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning 作者: Lo ...

  5. Paper Reading - Convolutional Image Captioning ( CVPR 2018 )

    Link of the Paper: https://arxiv.org/abs/1711.09151 Motivation: LSTM units are complex and inherentl ...

  6. Deep Learning 学习随记(八)CNN(Convolutional neural network)理解

    前面Andrew Ng的讲义基本看完了.Andrew讲的真是通俗易懂,只是不过瘾啊,讲的太少了.趁着看完那章convolution and pooling, 自己又去翻了翻CNN的相关东西. 当时看讲 ...

  7. Paper Reading - Convolutional Sequence to Sequence Learning ( CoRR 2017 ) ★

    Link of the Paper: https://arxiv.org/abs/1705.03122 Motivation: Compared to recurrent layers, convol ...

  8. About CNN(convolutional neural network)

    NO.1卷积神经网络基本概念 CNN是第一个被成功训练的多层深度神经网络结构,具有较强的容错.自学习及并行处理能力.最初是为识别二维图像而设计的多层感知器,局部连接和权值共享网络结构 类似于生物神经网 ...

  9. paper 158:CNN(卷积神经网络):Dropout Layer

    Dropout作用 在hinton的论文Improving neural networks by preventing coadaptation提出的,主要作用就是为了防止模型过拟合.当模型参数较多, ...

随机推荐

  1. H5基本标签

  2. 一点一点看JDK源码(五)java.util.ArrayList 后篇之sort与Comparator

    一点一点看JDK源码(五)java.util.ArrayList 后篇之sort与Comparator liuyuhang原创,未经允许禁止转载 本文举例使用的是JDK8的API 目录:一点一点看JD ...

  3. Webstorm设置代码提示

    下载路径: https://github.com/virtoolswebplayer/ReactNative-LiveTemplate 本插件可以配合Webstorm设置代码提示. Mac下安装 We ...

  4. iOS universallinks唤醒app

    从iOS9之后,苹果就推出了这个功能,用来唤醒外部app.这个功能在那些电商app上使用尤其广泛,当你打开对应的h5网页后,上面跳出一个是否跳转app的按钮. 现在iOS11已经基本覆盖,iOS12也 ...

  5. lock free

    #include <thread> #include <iostream> #include <mutex> #include <atomic> #in ...

  6. ajaxSubmit 在ie9或360兼容中,form下是空的

    解决办法:在<head>....</head>中加入<meta http-equiv="X-UA-Compatible" content=" ...

  7. 【mongdb主从复制和同步】

    主从同步: Master: Slave: 副本集: #在卷本中加任意主机 #登录从 #登录主 #同步日志 #仲裁: 向集群中添加主机成为仲裁 #查看集群里的成员角色参数:

  8. Redis学习笔记(一)

    定义 Redis是一个开源的使用ANSI C语言编写.支持网络.可基于内存亦可持久化的日志型.Key-Value数据库. 从该定义中抽出几个关键信息,以表示Redis的特性: 存储结构:key-val ...

  9. windows下MySQL免安装版配置教程mysql-8.0.12-winx64.zip版本

    引用1:https://blog.csdn.net/weixin_42831477/article/details/81589325 引用2:https://blog.csdn.net/qq_3193 ...

  10. scala 实现日期运算

    在scala程序中,有时我们需要对日期进行运算,比如一天之前,两天之前,一个月之前等等,本博文给出了简单的实现方式 val cal = Calendar.getInstance cal.add(Cal ...