Link of the Paper: https://arxiv.org/abs/1805.09019

Innovations:

  • The authors propose a CNN + CNN framework for image captioning. There are four modules in the framework: vision module ( VGG-16 ), which is adopted to "watch" images; language module, which is to model sentences; attention module, which connects the vision module with the language module; prediction module, which takes the visual features from the attention module and concepts from the language module as input and predicts the next word.

        

General Points:

  • RNNs or LSTMs cannot be calculated in parallel and ignore the underlying hierarchical structure of a sentence.
  • Directly feeding the output of the CNN into the RNN treats objects in an image the same and ignores the salient objects when generating one word.
  • In both m-RNN and NIC, an image is represented by a single vector, which ignores different areas and objects in the image. A spatial attention mechanism is introduced into image captioning model in Show, attend and tell: Neural image caption generation with visual attention, which allows the model to pay attention to different areas at each time step.

Paper Reading - CNN+CNN: Convolutional Decoders for Image Captioning的更多相关文章

  1. Paper Reading - Long-term Recurrent Convolutional Networks for Visual Recognition and Description ( CVPR 2015 )

    Link of the Paper: https://arxiv.org/abs/1411.4389 Main Points: A novel Recurrent Convolutional Arch ...

  2. 使用CNN(convolutional neural nets)关键的一点是检测到的面部教程(四):学习率,学习潜能,dropout

    第七部分 让 学习率 和 学习潜能 随时间的变化 光训练就花了一个小时的时间.等结果并非一个令人心情愉快的事情.这一部分.我们将讨论将两个技巧结合让网络训练的更快! 直觉上的解决的方法是,開始训练时取 ...

  3. Paper Reading: Stereo DSO

    开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...

  4. SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning

    题目:SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning 作者: Lo ...

  5. Paper Reading - Convolutional Image Captioning ( CVPR 2018 )

    Link of the Paper: https://arxiv.org/abs/1711.09151 Motivation: LSTM units are complex and inherentl ...

  6. Deep Learning 学习随记(八)CNN(Convolutional neural network)理解

    前面Andrew Ng的讲义基本看完了.Andrew讲的真是通俗易懂,只是不过瘾啊,讲的太少了.趁着看完那章convolution and pooling, 自己又去翻了翻CNN的相关东西. 当时看讲 ...

  7. Paper Reading - Convolutional Sequence to Sequence Learning ( CoRR 2017 ) ★

    Link of the Paper: https://arxiv.org/abs/1705.03122 Motivation: Compared to recurrent layers, convol ...

  8. About CNN(convolutional neural network)

    NO.1卷积神经网络基本概念 CNN是第一个被成功训练的多层深度神经网络结构,具有较强的容错.自学习及并行处理能力.最初是为识别二维图像而设计的多层感知器,局部连接和权值共享网络结构 类似于生物神经网 ...

  9. paper 158:CNN(卷积神经网络):Dropout Layer

    Dropout作用 在hinton的论文Improving neural networks by preventing coadaptation提出的,主要作用就是为了防止模型过拟合.当模型参数较多, ...

随机推荐

  1. C#自定义异常

    继承自System.ApplicationException类,并使用Exception作为自定义异常类名的结尾 三个构造函数:一个无参构造函数:一个字符串参数的构造函数:一个字符串参数,一个内部异常 ...

  2. Dynamic Ambient Occlusion and Indirect Lighting

    This sample was presented on the Nvida witesite, which detail a new idea to calculate the ambient oc ...

  3. Python 简单购物车

    product_list =[ ('huawei',3000), ('hongmiNote3',3000), ('sanxing',2600), ('ThinkPad870',15000), ('Ip ...

  4. HTML中汉字空格占位符

    == 普通的英文半角空格   ==   ==   == no-break space (普通的英文半角空格但不换行)   == 中文全角空格 (一个中文宽度)   ==   == en空格 (半个中文 ...

  5. 虚拟机下linux 系统网卡配置、固定IP地址

    1.进入该目录下修改内容 vi       /etc/sysconfig/network-scripts/  ifcfg-eth0 TYPE=Ethernet BOOTPROTO=static DEF ...

  6. SSH Secure :Algorithm negotiation failed,反复提示输入password对话框

    在嵌入式开发中,SSH Secure File Transfer Client 软件使用,方便了windows和linux之间文件拷贝,尤其是多台主机状况下. 最近装了Ubuntu 16.0.4,在V ...

  7. ruby配置镜像源

    1.打开电脑的cmd窗口,输入如下命令即可查看gem镜像: gem sources l 或是直接使用 gem sources 查询结果如下: C:\Users\Administrator>gem ...

  8. LeetCode-Algorithms 1. 两数之和

    个人练习记录 给定一个整数数组和一个目标值,找出数组中和为目标值的两个数. 你可以假设每个输入只对应一种答案,且同样的元素不能被重复利用. 示例: 给定 nums = [2, 7, 11, 15], ...

  9. 自定义view实现圆角图片

    前两天想实现一个圆角图片的效果,通过网络搜索后找到一些答案.这里自己再记录一下,加深一下自己的认识和知识理解. 实现圆角图片的思路是自定义一个ImageView,然后通过Ondraw()重绘的功能,将 ...

  10. SAP Odata実行命令(2)

    前言 $ skiptokenは.アプリケーションに送信されるエントリ数を制限するために使用されます. 膨大な数のエントリが要求された場合.これはパフォーマンスの向上にも役立ちます.次のリンクがアプリケ ...