Link of the Paper: https://arxiv.org/abs/1711.09151

Motivation:

  • LSTM units are complex and inherently sequential across time.
  • Convolutional networks have shown advantages on machine translation and conditional image generation.

Innovation:

  • The authors develop a convolutional ( CNN-based ) image captioning method that shows comparable performance to an LSTM based method on standard metrics.

    

  • The authors analyze the characteristics of CNN and LSTM nets and provide useful insights such as -- CNNs produce more entropy ( useful for diverse predictions ), better classification accuracy, and do not suffer from vanishing gradients.

Improvement:

  • Improved performance with a CNN model that uses Attention Mechanism to leverage spatial image features.

General Points:

  • Image Captioning is applicable to virtual assistants, editing tools, image indexing and support of the disabled.
  • Image Captioning is a basic ingredient for more complex operations such as storytelling and visual summarization.
  • An illustration of a classical RNN architecture for image captioning is provided below.

Paper Reading - Convolutional Image Captioning ( CVPR 2018 )的更多相关文章

  1. Paper Read: Convolutional Image Captioning

    Convolutional Image Captioning 2018-11-04 20:42:07 Paper: http://openaccess.thecvf.com/content_cvpr_ ...

  2. Paper Reading - Learning to Evaluate Image Captioning ( CVPR 2018 ) ★

    Link of the Paper: https://arxiv.org/abs/1806.06422 Innovations: The authors propose a novel learnin ...

  3. Paper Reading - Convolutional Sequence to Sequence Learning ( CoRR 2017 ) ★

    Link of the Paper: https://arxiv.org/abs/1705.03122 Motivation: Compared to recurrent layers, convol ...

  4. Paper Reading: Stereo DSO

    开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...

  5. 爬取CVPR 2018过程中遇到的坑

    爬取 CVPR 2018 过程中遇到的坑 使用语言及模块 语言: Python 3.6.6 模块: re requests lxml bs4 过程 一开始都挺顺利的,先获取到所有文章的链接再逐个爬取获 ...

  6. 在矩池云上复现 CVPR 2018 LearningToCompare_FSL 环境

    这是 CVPR 2018 的一篇少样本学习论文:Learning to Compare: Relation Network for Few-Shot Learning 源码地址:https://git ...

  7. Paper Reading - Long-term Recurrent Convolutional Networks for Visual Recognition and Description ( CVPR 2015 )

    Link of the Paper: https://arxiv.org/abs/1411.4389 Main Points: A novel Recurrent Convolutional Arch ...

  8. Paper Reading - CNN+CNN: Convolutional Decoders for Image Captioning

    Link of the Paper: https://arxiv.org/abs/1805.09019 Innovations: The authors propose a CNN + CNN fra ...

  9. Paper Reading - Deep Captioning with Multimodal Recurrent Neural Networks ( m-RNN ) ( ICLR 2015 ) ★

    Link of the Paper: https://arxiv.org/pdf/1412.6632.pdf Main Points: The authors propose a multimodal ...

随机推荐

  1. Matplotlib——初级

    matplotlib是一个专门用来绘图的库,在分析数据的时候,使用它可以将数据进行可视化,更直观的呈现.下面是几个通过matplot绘制的图. 通过图形的绘制,我们可以很清晰地看到数据直接的关系,并对 ...

  2. css代码添加背景图片常用代码

    css代码添加背景图片常用代码 1 背景颜色 { font-size: 16px; content: ""; display: block; width: 700px; heigh ...

  3. Jquery 操作 select 的操作指南

    这里我们以一个简单的select作为原型来进行说明: <select> <option value="a1">香蕉1</option> < ...

  4. iOS之webview加载网页、文件、html的方法

    UIWebView  是用来加载加载网页数据的一个框.UIWebView可以用来加载pdf.word.doc 等等文件 生成webview 有两种方法,1.通过storyboard 拖拽  2.通过a ...

  5. 关于mybatis编写sql问题.

    最近呢楼主回到长沙进行面试:被问了一个这样的问题,在mybatis中怎么进行模糊查询,望各位大佬在下方进行评论,好让我这菜鸡多学习一些.

  6. mint-ui message box 问题;

    当引用 mint-ui message box 的 出现的问题,我暂时是不知道为什么: 官网是这样写的: 于是 我也这么做的:(这里用小写,具体我也不清楚,毕竟文档上写的也不是很清楚,但是只有这样写, ...

  7. sqlserver 导出数据库表结构

    https://www.cnblogs.com/miaomiaoquanfa/p/6909835.html SELECT 表名 = case when a.colorder=1 then d.name ...

  8. mongodb C++ Driver安装

    前言 mongocxx官网地址 http://mongocxx.org/?jmp=docs 本文的安装版本是:mongocxx-r3.2.0.tar.gz . 参考文档安装过程http://mongo ...

  9. Hive命令行及参数配置

    1 . Hive  命令行 输入$HIVE_HOME/bin/hive –H 或者 –help 可以显示帮助选项: 说明: 1. -i 初始化 HQL 文件. 2. -e 从命令行执行指定的 HQL ...

  10. Python-变量与基础数据类型

    ·变量(variable)  笔记: 变量本质上是一个占位符.变量可以用来存储整数.字符串.列表等.简单的可以理解为一个座位,可以坐老人也可以坐小孩,可以坐男孩,也可以坐女孩. 在python里,标识 ...