Paper Reading - CNN+CNN: Convolutional Decoders for Image Captioning

Link of the Paper: https://arxiv.org/abs/1805.09019

Innovations:

The authors propose a CNN + CNN framework for image captioning. There are four modules in the framework: vision module ( VGG-16 ), which is adopted to "watch" images; language module, which is to model sentences; attention module, which connects the vision module with the language module; prediction module, which takes the visual features from the attention module and concepts from the language module as input and predicts the next word.

General Points:

RNNs or LSTMs cannot be calculated in parallel and ignore the underlying hierarchical structure of a sentence.
Directly feeding the output of the CNN into the RNN treats objects in an image the same and ignores the salient objects when generating one word.
In both m-RNN and NIC, an image is represented by a single vector, which ignores different areas and objects in the image. A spatial attention mechanism is introduced into image captioning model in Show, attend and tell: Neural image caption generation with visual attention, which allows the model to pay attention to different areas at each time step.

Paper Reading - CNN+CNN: Convolutional Decoders for Image Captioning的更多相关文章

Paper Reading - Long-term Recurrent Convolutional Networks for Visual Recognition and Description ( CVPR 2015 )
Link of the Paper: https://arxiv.org/abs/1411.4389 Main Points: A novel Recurrent Convolutional Arch ...
使用CNN（convolutional neural nets）关键的一点是检测到的面部教程（四）:学习率，学习潜能，dropout
第七部分让学习率和学习潜能随时间的变化光训练就花了一个小时的时间.等结果并非一个令人心情愉快的事情.这一部分.我们将讨论将两个技巧结合让网络训练的更快! 直觉上的解决的方法是,開始训练时取 ...
Paper Reading: Stereo DSO
开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...
SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning
题目:SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning 作者: Lo ...
Paper Reading - Convolutional Image Captioning ( CVPR 2018 )
Link of the Paper: https://arxiv.org/abs/1711.09151 Motivation: LSTM units are complex and inherentl ...
Deep Learning 学习随记（八）CNN（Convolutional neural network）理解
前面Andrew Ng的讲义基本看完了.Andrew讲的真是通俗易懂,只是不过瘾啊,讲的太少了.趁着看完那章convolution and pooling, 自己又去翻了翻CNN的相关东西. 当时看讲 ...
Paper Reading - Convolutional Sequence to Sequence Learning ( CoRR 2017 ) ★
Link of the Paper: https://arxiv.org/abs/1705.03122 Motivation: Compared to recurrent layers, convol ...
About CNN（convolutional neural network）
NO.1卷积神经网络基本概念 CNN是第一个被成功训练的多层深度神经网络结构,具有较强的容错.自学习及并行处理能力.最初是为识别二维图像而设计的多层感知器,局部连接和权值共享网络结构类似于生物神经网 ...
paper 158：CNN(卷积神经网络)：Dropout Layer
Dropout作用在hinton的论文Improving neural networks by preventing coadaptation提出的,主要作用就是为了防止模型过拟合.当模型参数较多, ...

随机推荐

启动tomcat的时候为啥你启动的是8，启动起来的确实其他的Tomcat
如果发现,是启动tomcat的时候为啥你启动的是8,启动起来的确实其他的Tomcat ,你可以去看看你的环境变量,是不是配了一个tomcat,
分别用Eclipse和IDEA搭建Scala+Spark开发环境
开发机器上安装jdk1.7.0_60和scala2.10.4,配置好相关环境变量.网上资料很多,安装过程忽略.此外,Eclipse使用Luna4.4.1,IDEA使用14.0.2版本. 1. Ecli ...
chromium之message_pump_win之三
上一篇分析MessagePumpForUI,参考chromium之message_pump_win之二 MessagePumpForIO,同MessagePumpForUI,也是要实现三个函数 // ...
MySQL索引的使用及注意事项
索引是存储引擎用于快速找到记录的一种数据结构.索引优化应该是对查询性能优化最有效的手段了.索引能够轻易将查询性能提高几个数量级,"最优"的索引有时比一个"好的" ...
微信小程序车牌号码模拟键盘输入
微信小程序车牌号码模拟键盘输入练习, 未经允许,禁止转载,抄袭,如需借鉴参考等,请附上该文章连接. 相关资料参考:https://blog.csdn.net/littlerboss/article/d ...
解决ssh连接中断程序终止的问题——tmux
参考:http://www.cnblogs.com/kevingrace/p/6496899.html ssh连接有时候会异常中断,重连后原本运行的程序会中断,要解决这个问题,我们可以使用Linux终 ...
vuejs中的生命周期
vue中生命周期分为初始化,跟新状态,销毁三个阶段 1.初始化阶段:beforeCreated,created,beforeMount,mounted 2.跟新状态:beforeUpdate,upda ...
shiro中基于注解实现的权限认证过程
授权即访问控制,它将判断用户在应用程序中对资源是否拥有相应的访问权限. 如,判断一个用户有查看页面的权限,编辑数据的权限,拥有某一按钮的权限等等. 一.用户权限模型为实现一个较为灵活的用户权限数据模 ...
C#调用c++类的导出函数
C# 需要调用C++东西,但是有不想做成COM,就只好先导出类中的函数处理. 不能直接调用,需单独导出函数参考:http://blog.csdn.net/cartzhang/article/deta ...
P2351 [SDOi2012]吊灯
P2351 [SDOi2012]吊灯 https://www.luogu.org/problemnew/show/P2351 题意: 一棵树,能否全部分成大小为x的联通块. 分析: 显然x是n ...

Paper Reading - CNN+CNN: Convolutional Decoders for Image Captioning

Paper Reading - CNN+CNN: Convolutional Decoders for Image Captioning的更多相关文章

随机推荐

热门专题