Paper Reading - Show, Attend and Tell: Neural Image Caption Generation with Visual Attention ( ICML 2015 )

Link of the Paper: https://arxiv.org/pdf/1502.03044.pdf

Main Points:

Encoder-Decoder Framework: Encoder uses a convolutional neural network to extract a set of feature vectors which the authors refer to as annotation vectors. The extractor produces L vectors, each of which is a D-dimensional representation corresponding to a part of the image. a = { a₁, ..., a_L }, a_i ∈ R^D. In order to obtain a correspondence between the feature vectors and portions of the 2-D image, they extract features from a lower convolutional layer unlike previous work which instead used a fully connected layer. This allows the decoder to selectively focus on certain parts of an image by weighting a subset of all the feature vectors. Decoder uses a LSTM network to produce a caption by generating one word at every time step conditioned on a context vector, the previous hidden state and the previously generated words.
Two attention-based image caption generators under a common framework: a "soft" deterministic attention mechanism trainable by standard back-propagation methods; and a "hard" stochastic attention mechanism trainable by maximizing an approximate variational lower bound or equivalently by Reinforce.

Other Key Points:

Rather than compress an entire image into a static representation, attention allows for salient features to dynamically come to the forefront as needed. This is especially important when there is a lot of clutter in an image. Using representations ( such as those from the very top layer of a conv net ) that distill information in image down to the most salient objects is one effective solution that has been widely adopted in previous work. Unfortunately, this has one potential drawback of losing information which could be useful for richer, more descriptive captions. Using lower-level representation can help preserve this information.

Paper Reading - Show, Attend and Tell: Neural Image Caption Generation with Visual Attention ( ICML 2015 )的更多相关文章

[Paper Reading] Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
论文链接:https://arxiv.org/pdf/1502.03044.pdf 代码链接:https://github.com/kelvinxu/arctic-captions & htt ...
论文笔记：Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention 2018-08-10 10:15:06 Pap ...
论文：Show, Attend and Tell: Neural Image Caption Generation with Visual Attention-阅读总结
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention-阅读总结笔记不能简单的抄写文中的内容,得有自 ...
Paper Reading - Show and Tell: A Neural Image Caption Generator ( CVPR 2015 )
Link of the Paper: https://arxiv.org/abs/1411.4555 Main Points: A generative model ( NIC, GoogLeNet ...
[Paper Reading] Show and Tell: A Neural Image Caption Generator
论文链接:https://arxiv.org/pdf/1411.4555.pdf 代码链接:https://github.com/karpathy/neuraltalk & https://g ...
[Paper Reading] Image Captioning using Deep Neural Architectures (arXiv: 1801.05568v1)
Main Contributions: A brief introduction about two different methods (retrieval based method and gen ...
Paper Reading - CNN+CNN: Convolutional Decoders for Image Captioning
Link of the Paper: https://arxiv.org/abs/1805.09019 Innovations: The authors propose a CNN + CNN fra ...
Paper Reading: Stereo DSO
开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...
Paper Reading - Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation ( CVPR 2015 )
Link of the Paper: https://ieeexplore.ieee.org/document/7298856/ A Correlative Paper: Learning a Rec ...

随机推荐

etcd部署说明
etcd是一个K/V分布式存储,每个节点都保存完成的一份数据.有点类似redis.但是etcd不是数据库. 1.先说废话.之所以会用etcd,并不是实际项目需要,而是前面自己写的上传的DBCacheS ...
Python 学习笔记（十）Python集合（一）
回顾 int/float/str/list/tuple/dict 整数型和浮点型是不可变的,不是序列字符串是不可变的,是序列列表是可变的,是序列元组是不可变的,是序列字典是可变得,但不是序列 ...
在mac下运行 npm run eject 出现报错问题解决方法
当使用create-react-app创建项目后,接着运行npm run eject时,如果出现下面的错误可能是脚手架添加了.gitignore这个文件,但是没有本地仓库,可以使用以下代码解决这个问 ...
stack permutation
#include <iostream> #include <stack> #include <queue> using namespace std; bool ch ...
chromium之ref_counted
namespace subtle { class RefCountedBase { protected: RefCountedBase(); ~RefCountedBase(); void AddRe ...
MYSQL 8.0.11 安装过程及 Navicat 链接时遇到的问题
参考博客:https://blog.csdn.net/WinstonLau/article/details/78666423 我的系统和软件版本是这样的: 系统环境:win7.64位 MySQL版本: ...
js之广告
涉及到offset等有关获取各种尺寸问题下 <!doctype html> <html lang="en"> <head> <meta c ...
企业IT架构转型之道读后感
放假三天,用部分时间阅读了企业IT架构转型之道这本书.第一遍潦草读完,就感觉收益颇多.这本书值得多读几遍,适合精度. 作为银行IT开发人员,在央企IT成本部门的大背景下,开发过程中遇到的诸多疑惑.困惑 ...
在多字节的目标代码页中，没有此 Unicode 字符可以映射到的字符。 (#1113)
报错在使用MySQL-Front导入sql文件时报错1113:在多字节的目标代码页中,没有此 Unicode 字符可以映射到的字符. (#1113) 解决方案导入.sql文件时,单击选择文件对话 ...
关于C链表的实现
学习了数据结构后,自己学习写了一个链表的程序.初步功能是实现了.但是不知道会不会有一些隐含的问题.所以希望大佬指导指导 /******************/ /*一个小的链表程序*/ /***** ...

Paper Reading - Show, Attend and Tell: Neural Image Caption Generation with Visual Attention ( ICML 2015 )

Paper Reading - Show, Attend and Tell: Neural Image Caption Generation with Visual Attention ( ICML 2015 )的更多相关文章

随机推荐

热门专题