论文链接:https://arxiv.org/pdf/1411.4555.pdf

代码链接:https://github.com/karpathy/neuraltalkhttps://github.com/karpathy/neuraltalk2 & https://github.com/zsdonghao/Image-Captioning


主要贡献

在这篇文章中,作者借鉴了神经机器翻译(Neural Machine Translation)领域的方法,将“编码器-解码器(Encoder-Decoder)”模型引入了神经图像标注(Neural Image Captioning)领域,提出了一种端到端(end-to-end)的模型解决图像标注问题。下面展示了从论文中截取的两幅图片,第一幅图片是NIC模型的概述,第二幅图片描述了网络的细节。NIC网络采用卷积神经网络(CNN)作为编码器,长短期记忆网络(LSTM)作为解码器。

            


实验细节

Hence, it is natural to use a CNN as an image “encoder”, by first pre-training it for an image classification task and using the last hidden layer as an input to the RNN decoder that generates sentences.

An “encoder” RNN reads the source sentence and transforms it into a rich fixed-length vector representation, which in turn in used as the initial hidden state of a “decoder” RNN that generates the target sentence.

  • 在文章中,作者提出使用随机梯度下降(Stochastic Gradient Descent)训练网络。在官方给出的源码neuraltalk2中,作者给出了多种训练网络的优化器及其参数(rmsprop,adagrad,sgd……详见neuraltalk2/misc/optim_updates.lua)。zsdonghao/Image-Captioning使用SGD训练网络,初始学习率2.0,学习率衰减因子0.5,学习率下降后每一代的数量8.0。

It is a neural net which is fully trainable using stochastic gradient descent.

The model is trained to maximize the likelihood of the target description sentence given the training image.

  • 在neuraltalk2中,LSTM层的输入(Embedding层的输出)向量维度和LSTM隐藏层的向量维度均设置为512。zsdonghao/Image-Captioning的设置相同。
  • 在zsdonghao/Image-Captioning中,作者将vocabulary_size设置为12000。

版权声明:本文为博主原创文章,欢迎转载,转载请注明作者及原文出处!

[Paper Reading] Show and Tell: A Neural Image Caption Generator的更多相关文章

  1. Paper Reading - Show and Tell: A Neural Image Caption Generator ( CVPR 2015 )

    Link of the Paper: https://arxiv.org/abs/1411.4555 Main Points: A generative model ( NIC, GoogLeNet ...

  2. Paper Reading - Show, Attend and Tell: Neural Image Caption Generation with Visual Attention ( ICML 2015 )

    Link of the Paper: https://arxiv.org/pdf/1502.03044.pdf Main Points: Encoder-Decoder Framework: Enco ...

  3. [Paper Reading] Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

    论文链接:https://arxiv.org/pdf/1502.03044.pdf 代码链接:https://github.com/kelvinxu/arctic-captions & htt ...

  4. [Paper Reading] Image Captioning using Deep Neural Architectures (arXiv: 1801.05568v1)

    Main Contributions: A brief introduction about two different methods (retrieval based method and gen ...

  5. Paper Reading - Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge

    Link of the Paper: https://arxiv.org/abs/1609.06647 A Correlative Paper: Show and Tell: A Neural Ima ...

  6. 论文:Show and Tell: A Neural Image Caption Generator-阅读总结

    Show and Tell: A Neural Image Caption Generator-阅读总结 笔记不能简单的抄写文中的内容,得有自己的思考和理解. 一.基本信息 标题 作者 作者单位 发表 ...

  7. Paper Reading: Stereo DSO

    开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...

  8. Paper Reading - Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation ( CVPR 2015 )

    Link of the Paper: https://ieeexplore.ieee.org/document/7298856/ A Correlative Paper: Learning a Rec ...

  9. Paper Reading - CNN+CNN: Convolutional Decoders for Image Captioning

    Link of the Paper: https://arxiv.org/abs/1805.09019 Innovations: The authors propose a CNN + CNN fra ...

随机推荐

  1. groovy基本语法--JSON

    1.groovy提供了对JSON解析的方法       ①JsonSlurper   JsonSlurper是一个将JSON文本或阅读器内容解析为Groovy数据的类结构,例如map,列表和原始类型, ...

  2. [Google Guava] 6-字符串处理:分割,连接,填充

    原文链接 译文链接 译者:沈义扬,校对:丁一 连接器[Joiner] 用分隔符把字符串序列连接起来也可能会遇上不必要的麻烦.如果字符串序列中含有null,那连接操作会更难.Fluent风格的Joine ...

  3. poi操作excel之: 将NUMERIC转换成TEXT

    一.NUMERIC TO TEXT(生成excel)代码生成一个excel文件: public static void generateExcel() throws Exception { XSSFW ...

  4. 用chrome console实现自动化操作网页

    因为chrome console只能访问当前页的上下文(以及chrome扩展的上下文),无法访问其他标签页面的上下文,所以局限性较大,仅适用于一些较简单的操作 经实践,可以在chrome的一个标签页的 ...

  5. java大视频上传实现

    理清思路: 引入了两个概念:块(block)和片(chunk).每个块由一到多个片组成,而一个资源则由一到多个块组成 块是服务端的永久数据存储单位,片则只在分片上传过程中作为临时存储的单位.服务端会以 ...

  6. leetcode解题报告(4):Search in Rotated Sorted ArrayII

    描述 Follow up for "Search in Rotated Sorted Array": What if duplicates are allowed? Would t ...

  7. Apache Kudu: Hadoop生态系统的新成员实现对快速数据的快速分析

    A new addition to the open source Apache Hadoop ecosystem, Apache Kudu completes Hadoop's storage la ...

  8. 带下划线的 HTTP Header无法获取到可能是因为nginx

    背景:新版本修改了个功能是在老版本的基础上做的,同一个接口,需要兼容老版本,因此让前台在header中封装了 version版本号,client_type 客户端类型,根据这两个字段判断接口要走的逻辑 ...

  9. sql mode 问题及解决 错误代码:1055 this is incompatible with sql_mode=only_full_group_by

    数据库升级到5.7.21后,一个正常的分组后按日期排序,并返回数据的语句开始报错: 语句如下: SELECT id,title,add_time FROM `message` GROUP BY add ...

  10. ICEM-圆环孔

    原视频下载地址:https://yunpan.cn/cSKwAIRmFLGe5  访问密码 49c5