Paper Reading - Show, Attend and Tell: Neural Image Caption Generation with Visual Attention ( ICML 2015 )
Link of the Paper: https://arxiv.org/pdf/1502.03044.pdf
Main Points:
- Encoder-Decoder Framework: Encoder uses a convolutional neural network to extract a set of feature vectors which the authors refer to as annotation vectors. The extractor produces L vectors, each of which is a D-dimensional representation corresponding to a part of the image. a = { a1, ..., aL }, ai ∈ RD. In order to obtain a correspondence between the feature vectors and portions of the 2-D image, they extract features from a lower convolutional layer unlike previous work which instead used a fully connected layer. This allows the decoder to selectively focus on certain parts of an image by weighting a subset of all the feature vectors. Decoder uses a LSTM network to produce a caption by generating one word at every time step conditioned on a context vector, the previous hidden state and the previously generated words.
- Two attention-based image caption generators under a common framework: a "soft" deterministic attention mechanism trainable by standard back-propagation methods; and a "hard" stochastic attention mechanism trainable by maximizing an approximate variational lower bound or equivalently by Reinforce.

Other Key Points:
- Rather than compress an entire image into a static representation, attention allows for salient features to dynamically come to the forefront as needed. This is especially important when there is a lot of clutter in an image. Using representations ( such as those from the very top layer of a conv net ) that distill information in image down to the most salient objects is one effective solution that has been widely adopted in previous work. Unfortunately, this has one potential drawback of losing information which could be useful for richer, more descriptive captions. Using lower-level representation can help preserve this information.
Paper Reading - Show, Attend and Tell: Neural Image Caption Generation with Visual Attention ( ICML 2015 )的更多相关文章
- [Paper Reading] Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
论文链接:https://arxiv.org/pdf/1502.03044.pdf 代码链接:https://github.com/kelvinxu/arctic-captions & htt ...
- 论文笔记:Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention 2018-08-10 10:15:06 Pap ...
- 论文:Show, Attend and Tell: Neural Image Caption Generation with Visual Attention-阅读总结
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention-阅读总结 笔记不能简单的抄写文中的内容,得有自 ...
- Paper Reading - Show and Tell: A Neural Image Caption Generator ( CVPR 2015 )
Link of the Paper: https://arxiv.org/abs/1411.4555 Main Points: A generative model ( NIC, GoogLeNet ...
- [Paper Reading] Show and Tell: A Neural Image Caption Generator
论文链接:https://arxiv.org/pdf/1411.4555.pdf 代码链接:https://github.com/karpathy/neuraltalk & https://g ...
- [Paper Reading] Image Captioning using Deep Neural Architectures (arXiv: 1801.05568v1)
Main Contributions: A brief introduction about two different methods (retrieval based method and gen ...
- Paper Reading - CNN+CNN: Convolutional Decoders for Image Captioning
Link of the Paper: https://arxiv.org/abs/1805.09019 Innovations: The authors propose a CNN + CNN fra ...
- Paper Reading: Stereo DSO
开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...
- Paper Reading - Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation ( CVPR 2015 )
Link of the Paper: https://ieeexplore.ieee.org/document/7298856/ A Correlative Paper: Learning a Rec ...
随机推荐
- Swift_TableView(delegate,dataSource,prefetchDataSource 详解)
Swift_TableView(delegate,dataSource,prefetchDataSource 详解) GitHub import UIKit let identifier = &quo ...
- fis3 安装(Linux)
Linux安装fis3 1,首先安装node环境 https://segmentfault.com/a/1190000004245357 2,安装fis3 http://blog.csdn.net/g ...
- chromium之message_pump_default
看看头文件,默认的消息泵,该类实现了MessagePump的四个接口 class MessagePumpDefault : public MessagePump { public: MessagePu ...
- TinyMCE插件:Filemanager [4.x-6.x] 文件名统一格式化
上传图片程序(filemanager/upload.php) 在if (!empty($_FILES) && $upload_files)中上传图片时,在文件正式上传至服务器前,有一次 ...
- YII2集成GOAOP,实现面向方面编程!
引言: 软件开发的目标是要对世界的部分元素或者信息流建立模型,实现软件系统的工程需要将系统分解成可以创建和管理的模块.于是出现了以系统模块化特性的面向对象程序设计技术.模块化的面向对象编程极度地提高了 ...
- centos系统安装后无法稳定连接wifi的解决方法
在安装双系统的时候遇到的问题,虽然不知道原理,但是弄好能用就可以,这类bug太邪恶了 wifi不能用的情况: 先查看wifi状态: rfkill list all 0: hci0: Bluetooth ...
- python学习——初始面向对象
一.讲在前面 编程的世界中有三大体系,面向过程.面向函数和面向对象编程.而面向过程的编程就包括了面向函数编程,接下来说一下面向对象.假如 ,你现在是一家游戏公司的开发人员,现在需要你开发一款叫做< ...
- N个点中寻找多个最近两点的计算O(N²)
#include<math.h> #include<stdio.h> #include<stdlib.h> typedef struct point { float ...
- 后端系统开发利器之gflags
gflags是Google的一个开源项目,用于解析程序运行参数.gflags简单易用,它的好处在于统一配置格式,减少开发工作量.在工程实践中,gflags在简化开发和测试方面表现非常出色,它还有一个很 ...
- MapWindow介绍
官方网站:http://www.mapwindow.org/ 网站里包含了几个开源项目 目前最新版本是Mapwindow5,之前的mapwindow4版本已经停止更新,同时Mapwindow5底层是调 ...