[Paper Reading] Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

论文链接：https://arxiv.org/pdf/1502.03044.pdf

代码链接：https://github.com/kelvinxu/arctic-captions & https://github.com/yunjey/show-attend-and-tell & https://github.com/jazzsaxmafia/show_attend_and_tell.tensorflow

主要贡献

在这篇文章中，作者将“注意力机制（Attention Mechanism）”引入了神经机器翻译（Neural Image Captioning）领域，提出了两种不同的注意力机制：‘Soft’ Deterministic Attention Mechanism & ‘Hard’ Stochastic Attention Mechanism。下图展示了"Show, Attend and Tell"模型的整体框架。

注意力机制的关键点在于，如何从图像的特征向量a_i中计算得到上下文向量z_t。对于每一个位置i，注意力机制能够产生一个权重e_ti。在Hard Attention机制中，权重α_ti所扮演的角色是图像区域向量a_i在t时刻被选中作为解码器的信息的概率，有且只有一个区域会被选中，为此，引入变量s_t,i，当区域i被选中时为1，否则为0；在Soft Attention机制中，权重α_ti所扮演的角色是图像区域向量a_i在t时刻输入解码器的信息中所占的比例。（参考Attention机制论文阅读——Soft和Hard Attention，Multimodal —— 看图说话（Image Caption）任务的论文笔记（二）引入attention机制）

实验细节

在文章中，作者提出使用在ImageNet数据集上预训练好、不进行微调的VGGNet提取图像特征，将block5_conv4（Conv2D）提取到的feature map（14×14×512）reshape为196×512（L×D，L=196，D=512，即196个图像区域，每个区域特征向量的维度是512）的图像区域向量a_i。

To create the annotations a_i used by our decoder, we used the Oxford VGGnet pretrained on ImageNet without finetuning.

In our experiments we use the 14×14×512 feature map of the fourth convolutional layer before max pooling. This means our decoder operates on the flattened 196×512 (i.e L × D) encoding.

在文章中，作者指出，解码器LSTM初始的细胞状态（init_c）与隐层状态（init_h）由从图像中提取到的特征向量及两个独立的多层感知机（Multi-Layer Perception, MLP）决定。

The initial memory state and hidden state of the LSTM are predicted by an average of the annotation vectors fed through two separate MLPs(init,c and init,h).

[Paper Reading] Show, Attend and Tell: Neural Image Caption Generation with Visual Attention的更多相关文章

Paper Reading - Show, Attend and Tell: Neural Image Caption Generation with Visual Attention ( ICML 2015 )
Link of the Paper: https://arxiv.org/pdf/1502.03044.pdf Main Points: Encoder-Decoder Framework: Enco ...
论文笔记：Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention 2018-08-10 10:15:06 Pap ...
论文：Show, Attend and Tell: Neural Image Caption Generation with Visual Attention-阅读总结
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention-阅读总结笔记不能简单的抄写文中的内容,得有自 ...
Paper Reading - Show and Tell: A Neural Image Caption Generator ( CVPR 2015 )
Link of the Paper: https://arxiv.org/abs/1411.4555 Main Points: A generative model ( NIC, GoogLeNet ...
[Paper Reading] Show and Tell: A Neural Image Caption Generator
论文链接:https://arxiv.org/pdf/1411.4555.pdf 代码链接:https://github.com/karpathy/neuraltalk & https://g ...
[Paper Reading] Image Captioning using Deep Neural Architectures (arXiv: 1801.05568v1)
Main Contributions: A brief introduction about two different methods (retrieval based method and gen ...
Paper Reading - CNN+CNN: Convolutional Decoders for Image Captioning
Link of the Paper: https://arxiv.org/abs/1805.09019 Innovations: The authors propose a CNN + CNN fra ...
Paper Reading: Stereo DSO
开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...
Paper Reading - Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation ( CVPR 2015 )
Link of the Paper: https://ieeexplore.ieee.org/document/7298856/ A Correlative Paper: Learning a Rec ...

随机推荐

Windows启动报错：无效的分区表解决方法，哈哈
Windows系统启动时出现Invalid Partition Table错误 2018-07-27 16:32 今天KB小网管跟大家分享一个在重装Windows系统时会出现的一个问题Invalid ...
JS 日期格式化，留作参考
<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8&quo ...
struts2之使用oracle分页（10）
ToolsUtil //每页显示的记录数 public static final int NUM_PER_PAGE=5; /* * java.util.Date转java.sql.Date */ pu ...
HDU-1160-FatMouse's Speed(DP, 最长递增子序列)
链接: https://vjudge.net/problem/HDU-1160 题意: FatMouse believes that the fatter a mouse is, the faster ...
SQL的左连接与右链接
1.准备工作 Oracle 外连接(OUTER JOIN)包括以下: 左外连接(左边的表不加限制) 右外连接(右边的表不加限制) 全外连接(左右两表都不加限制) 对应SQL:LEFT/RIGHT/F ...
JQuery-UI组件化开发
===================== 页面相关样式及其脚本的引入先后顺序,如下: 1,layout.css 页面的静态基本框架布局样式 2,base.css 页面的静态细节样式 3,ui.css ...
P4047 [JSOI2010]部落划分并查集
思路:并查集+生成树提交:2次(虽然样例都没过但感觉是对的$QwQ$(判边少了一条)) 题解: 把所有点之间连边,然后$sort$一遍,从小往大加边,直到连第$n-k+1$条边(相当于是破话$k$个 ...
ueditor粘贴从word中copy的图片和文字图片无法显示的问题
我司需要做一个需求,就是使用富文本编辑器时,不要以上传附件的形式上传图片,而是以复制粘贴的形式上传图片. 在网上找了一下,有一个插件支持这个功能. WordPaster 安装方式如下: 直接使用Wor ...
python 改变函数实参的值
def change(n): n[] = 'Mr Gumby' names = ['Mrs Entity', 'Mrs. Thing'] change(names) print(names) resu ...
Codeforces 1221 E Game With String
题面第一眼以为是SG函数找规律题,然后发现并不是公平游戏.... 不过后来想了想,其实这样反而更好做. 这个游戏的一个显然的特性是,任何时候当场上存在长度 ∈[b,a)的块时,Bob必胜.(考虑贪心 ...

[Paper Reading] Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

[Paper Reading] Show, Attend and Tell: Neural Image Caption Generation with Visual Attention的更多相关文章

随机推荐

热门专题