Link of the Paper: https://ieeexplore.ieee.org/document/7298856/

A Correlative Paper: Learning a Recurrent Visual Representation for Image Caption Generation (Link of the Paper: https://arxiv.org/abs/1411.5654)

Main Points:

  1. A bi-directional mapping model using recurrent neural networks: unlike previous approaches which map both sentences and images to a common embedding ( and then calculate the similarity and match / generate, I guess ) that may be used for image search or for ranking image captions.
  2. A bi-directional representation: generates both novel descriptions from images and visual representations from descriptions.
  3. A novel recurrent visual memory: automatically learns to remember long-term visual concepts.
  4. A set of latent variables Ut-1 that encodes the visual interpretation of the previously generated or read words Wt-1. Using U, our goal is to compute P(wt | V, Wt-1, Ut-1) and P(V | Wt-1, Ut-1). Combining these two likelihoods together our global objective is to maximize, P(wt, V | Wt-1, Ut-1) = P(wt | V, Wt-1, Ut-1)P(V | Wt-1, Ut-1). That is, we want to maximize the likelihood of the word wt and the observed visual features V given the previous words and their visual interpretation. Note that in previous papers, the objective was only to compute P(wt | V, Wt-1) and not P(V | Wt-1).

Other Key Points:

  1. Previous approaches project both semantics and visual features to a common embedding, they are not able to perform the inverse projection. That is, they cannot generate novel sentences or visual depictions from the embedding.

Paper Reading - Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation ( CVPR 2015 )的更多相关文章

  1. Paper Reading - Deep Captioning with Multimodal Recurrent Neural Networks ( m-RNN ) ( ICLR 2015 ) ★

    Link of the Paper: https://arxiv.org/pdf/1412.6632.pdf Main Points: The authors propose a multimodal ...

  2. Paper Reading - Show and Tell: A Neural Image Caption Generator ( CVPR 2015 )

    Link of the Paper: https://arxiv.org/abs/1411.4555 Main Points: A generative model ( NIC, GoogLeNet ...

  3. Paper Reading: Stereo DSO

    开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...

  4. Paper Reading - CNN+CNN: Convolutional Decoders for Image Captioning

    Link of the Paper: https://arxiv.org/abs/1805.09019 Innovations: The authors propose a CNN + CNN fra ...

  5. Paper Reading: In Defense of the Triplet Loss for Person Re-Identification

    In Defense of the Triplet Loss for Person Re-Identification  2017-07-02  14:04:20   This blog comes ...

  6. CVPR 2016 paper reading (6)

    1. Neuroaesthetics in fashion: modeling the perception of fashionability, Edgar Simo-Serra, Sanja Fi ...

  7. 论文笔记:Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language Association

    Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language ...

  8. 论文笔记:Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

    Show, Attend and Tell: Neural Image Caption Generation with Visual Attention 2018-08-10 10:15:06 Pap ...

  9. 【CV】ICCV2015_Unsupervised Visual Representation Learning by Context Prediction

    Unsupervised Visual Representation Learning by Context Prediction Note here: it's a learning note on ...

随机推荐

  1. iOS文本文件的编码检测

    windows上很多文本未必是用UTF8,所以在iOS上读取的时候,如何得到文件的编码是个问题.网上有很多读取中文的例子,但是那些不够通用.比如说要读取日文,韩文,阿拉伯文等等的时候,就不行了(虽然一 ...

  2. ORM一对多查询

    现有两张表,一张书籍表(Book),一张作者表(Author) 现在想查询出书本信息和书本的作者 book=Book.objects.get(name="python") book ...

  3. jar下载地址

    java开发难免需要下载额外的jar,推荐一个地址 http://www.java2s.com/Code/Jar/CatalogJar.htm

  4. Mybatis 配置文件

    1.核心配置文件 sqlMapConfig.xml <?xml version="1.0" encoding="UTF-8" ?> <!DOC ...

  5. 快速提高谷歌浏览器(Chrome)自带下载器的网速

    之前每次下载东西都是复制好下载链接到迅雷中下载,会提高成倍网速,但是时间一长,感觉不方便,废话不多说,上干货~ 由于中国防火墙(GFW)的强大,在线下载Google浏览器的时候速度非常慢,如果只是单独 ...

  6. Linux系统查找清理磁盘大文件

    本文主要介绍Linux系统磁盘使用空间不足时,如何查找大文件并进行清理的方法. 使用df-h检查一台服务器磁盘使用空间,发现磁盘已经使用了100%,其中/dev/mapper/vg_iavp-lv_r ...

  7. Java动态代理代码快速上手

    动态代理的两个核心的点是:代理的行为 和 代理机构. 举个例子,上大学的时候,很多同学吃午饭的时候都是叫别人带饭,有一个人H特别热心肠,想了一个办法,他在门口挂了个公示牌,每天有谁想要找人带饭就写公告 ...

  8. 20155204 实验3《敏捷开发与XP实践》实验报告

    20155204 实验3<敏捷开发与XP实践>实验报告 一.实验内容与步骤 1.研究IDEA的code菜单. 老师给的任务的是把一串代码格式化,这个任务很简单.code菜单主要是关于编辑代 ...

  9. 20155209实验二《Java面向对象程序设计》

    20155209实验二<Java面向对象程序设计> 实验内容 初步掌握单元测试和TDD 理解并掌握面向对象三要素:封装.继承.多态 初步掌握UML建模 熟悉S.O.L.I.D原则 了解设计 ...

  10. # 20155224 2016-2017-2 《Java程序设计》第10周学习总结

    20155224 2016-2017-2 <Java程序设计>第10周学习总结 教材学习内容总结 密码学: 主要是研究保密通信和信息保密的学科, 包括信息保密传输和信息加密存储等. 密码学 ...