[Paper Reading] Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

论文链接：https://arxiv.org/pdf/1502.03044.pdf

代码链接：https://github.com/kelvinxu/arctic-captions & https://github.com/yunjey/show-attend-and-tell & https://github.com/jazzsaxmafia/show_attend_and_tell.tensorflow

主要贡献

在这篇文章中，作者将“注意力机制（Attention Mechanism）”引入了神经机器翻译（Neural Image Captioning）领域，提出了两种不同的注意力机制：‘Soft’ Deterministic Attention Mechanism & ‘Hard’ Stochastic Attention Mechanism。下图展示了"Show, Attend and Tell"模型的整体框架。

注意力机制的关键点在于，如何从图像的特征向量a_i中计算得到上下文向量z_t。对于每一个位置i，注意力机制能够产生一个权重e_ti。在Hard Attention机制中，权重α_ti所扮演的角色是图像区域向量a_i在t时刻被选中作为解码器的信息的概率，有且只有一个区域会被选中，为此，引入变量s_t,i，当区域i被选中时为1，否则为0；在Soft Attention机制中，权重α_ti所扮演的角色是图像区域向量a_i在t时刻输入解码器的信息中所占的比例。（参考Attention机制论文阅读——Soft和Hard Attention，Multimodal —— 看图说话（Image Caption）任务的论文笔记（二）引入attention机制）

实验细节

在文章中，作者提出使用在ImageNet数据集上预训练好、不进行微调的VGGNet提取图像特征，将block5_conv4（Conv2D）提取到的feature map（14×14×512）reshape为196×512（L×D，L=196，D=512，即196个图像区域，每个区域特征向量的维度是512）的图像区域向量a_i。

To create the annotations a_i used by our decoder, we used the Oxford VGGnet pretrained on ImageNet without finetuning.

In our experiments we use the 14×14×512 feature map of the fourth convolutional layer before max pooling. This means our decoder operates on the flattened 196×512 (i.e L × D) encoding.

在文章中，作者指出，解码器LSTM初始的细胞状态（init_c）与隐层状态（init_h）由从图像中提取到的特征向量及两个独立的多层感知机（Multi-Layer Perception, MLP）决定。

The initial memory state and hidden state of the LSTM are predicted by an average of the annotation vectors fed through two separate MLPs(init,c and init,h).

[Paper Reading] Show, Attend and Tell: Neural Image Caption Generation with Visual Attention的更多相关文章

Paper Reading - Show, Attend and Tell: Neural Image Caption Generation with Visual Attention ( ICML 2015 )
Link of the Paper: https://arxiv.org/pdf/1502.03044.pdf Main Points: Encoder-Decoder Framework: Enco ...
论文笔记：Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention 2018-08-10 10:15:06 Pap ...
论文：Show, Attend and Tell: Neural Image Caption Generation with Visual Attention-阅读总结
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention-阅读总结笔记不能简单的抄写文中的内容,得有自 ...
Paper Reading - Show and Tell: A Neural Image Caption Generator ( CVPR 2015 )
Link of the Paper: https://arxiv.org/abs/1411.4555 Main Points: A generative model ( NIC, GoogLeNet ...
[Paper Reading] Show and Tell: A Neural Image Caption Generator
论文链接:https://arxiv.org/pdf/1411.4555.pdf 代码链接:https://github.com/karpathy/neuraltalk & https://g ...
[Paper Reading] Image Captioning using Deep Neural Architectures (arXiv: 1801.05568v1)
Main Contributions: A brief introduction about two different methods (retrieval based method and gen ...
Paper Reading - CNN+CNN: Convolutional Decoders for Image Captioning
Link of the Paper: https://arxiv.org/abs/1805.09019 Innovations: The authors propose a CNN + CNN fra ...
Paper Reading: Stereo DSO
开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...
Paper Reading - Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation ( CVPR 2015 )
Link of the Paper: https://ieeexplore.ieee.org/document/7298856/ A Correlative Paper: Learning a Rec ...

随机推荐

[转载]Appium工作原理
[Appium]Appium工作原理 2017-09-13 15:28 sophia194910 阅读(7658) 评论(0) 编辑收藏参考:http://www.cnblogs.com/zhjs ...
使用python批量造测试数据
# -*- coding:utf-8 -*- import json import os import time class Virtual_Data: def __init__(self): sel ...
Vulkan 02
https://www.imgtec.com/blog/vulkan-high-efficiency-on-mobile/ vulkan性能上的优势降低CPU开销 drawcall上限数量增加 Ho ...
lvs+keepalived做高可用方案1
本文我们主要讲解的是LVS通过keepalived来实现负载均衡和高可用,而不是我们第三篇文章介绍的通过手动的方式来进行配置.通过脚本的方式来显示RS节点的健康检查和LVS的故障切换.此文会通过一个实 ...
Warning: (1260, 'Row xxx was cut by GROUP_CONCAT()')
MySql数据库查询时,使用group_concat报错“Row XXX was cut by GROUP_CONCAT()”,查了下是因为group_concat有个最大长度的限制,超过最大长度就会 ...
spark job分析
spark job spark job提交三级调度框架, DagSch,计算stage,提交阶段,将stage映射成taskset,提交taskset给tasksch. TaskSch Backen ...
25、自动装配-@Profile根据环境注册bean
25.自动装配-@Profile根据环境注册bean 指定组件在哪个环境的情况下才能被注册到容器中加了环境标识的,只有这个环境被激活才能注册到组件中默认是default环境写在类上,整个配置类的 ...
CF922D Robot Vacuum Cleaner 贪心+排序
正确的贪心方法:按照比例排序. code: #include <bits/stdc++.h> #define N 200000 #define ll long long #define s ...
++a和a++不是左值
上面的编译时会出现一下错误: aplus2.c:6:6: error: lvalue required as left operand of assignmentaplus2.c:7:6: error ...
线段树QWQ
一直没碰过线段树,个人认为好长好难,不过这几天做题遇到了裸的线段树的题,TAT. 线段树我理解就是把二叉树的左右节点现在分别看成是两个区间. 那么现在这两个区间的端点怎么存放?怎么能够把这个区间里的数 ...

[Paper Reading] Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

[Paper Reading] Show, Attend and Tell: Neural Image Caption Generation with Visual Attention的更多相关文章

随机推荐

热门专题