Link of the Paper: https://ieeexplore.ieee.org/document/7298856/

A Correlative Paper: Learning a Recurrent Visual Representation for Image Caption Generation (Link of the Paper: https://arxiv.org/abs/1411.5654)

Main Points:

  1. A bi-directional mapping model using recurrent neural networks: unlike previous approaches which map both sentences and images to a common embedding ( and then calculate the similarity and match / generate, I guess ) that may be used for image search or for ranking image captions.
  2. A bi-directional representation: generates both novel descriptions from images and visual representations from descriptions.
  3. A novel recurrent visual memory: automatically learns to remember long-term visual concepts.
  4. A set of latent variables Ut-1 that encodes the visual interpretation of the previously generated or read words Wt-1. Using U, our goal is to compute P(wt | V, Wt-1, Ut-1) and P(V | Wt-1, Ut-1). Combining these two likelihoods together our global objective is to maximize, P(wt, V | Wt-1, Ut-1) = P(wt | V, Wt-1, Ut-1)P(V | Wt-1, Ut-1). That is, we want to maximize the likelihood of the word wt and the observed visual features V given the previous words and their visual interpretation. Note that in previous papers, the objective was only to compute P(wt | V, Wt-1) and not P(V | Wt-1).

Other Key Points:

  1. Previous approaches project both semantics and visual features to a common embedding, they are not able to perform the inverse projection. That is, they cannot generate novel sentences or visual depictions from the embedding.

Paper Reading - Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation ( CVPR 2015 )的更多相关文章

  1. Paper Reading - Deep Captioning with Multimodal Recurrent Neural Networks ( m-RNN ) ( ICLR 2015 ) ★

    Link of the Paper: https://arxiv.org/pdf/1412.6632.pdf Main Points: The authors propose a multimodal ...

  2. Paper Reading - Show and Tell: A Neural Image Caption Generator ( CVPR 2015 )

    Link of the Paper: https://arxiv.org/abs/1411.4555 Main Points: A generative model ( NIC, GoogLeNet ...

  3. Paper Reading: Stereo DSO

    开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...

  4. Paper Reading - CNN+CNN: Convolutional Decoders for Image Captioning

    Link of the Paper: https://arxiv.org/abs/1805.09019 Innovations: The authors propose a CNN + CNN fra ...

  5. Paper Reading: In Defense of the Triplet Loss for Person Re-Identification

    In Defense of the Triplet Loss for Person Re-Identification  2017-07-02  14:04:20   This blog comes ...

  6. CVPR 2016 paper reading (6)

    1. Neuroaesthetics in fashion: modeling the perception of fashionability, Edgar Simo-Serra, Sanja Fi ...

  7. 论文笔记:Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language Association

    Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language ...

  8. 论文笔记:Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

    Show, Attend and Tell: Neural Image Caption Generation with Visual Attention 2018-08-10 10:15:06 Pap ...

  9. 【CV】ICCV2015_Unsupervised Visual Representation Learning by Context Prediction

    Unsupervised Visual Representation Learning by Context Prediction Note here: it's a learning note on ...

随机推荐

  1. iOS:Masonry约束经验(19-03-21更)

    1.label约束: 1).只需约束x.y 点相关就行.宽高 长度相关不用约束,就算用boundingRectWithSize计算出来的,也可能不准. 如:top.bottom二选一,trailing ...

  2. iOS 地图相关

    参考博文:https://blog.csdn.net/zhengang007/article/details/52858198?utm_source=blogxgwz7 1.坐标系 目前常见的坐标系有 ...

  3. JS中判断字符串中出现次数最多的字符及出现的次数

    <script type="text/javascript"> var str = 'qwertyuilo.,mnbvcsarrrrrrrrtyuiop;l,mhgfd ...

  4. Java实例 Part5:面向对象入门

    目录 Part5:面向对象入门 Example01:成员变量的初始化值 Example02:单例模式的应用 -----懒汉式 -----饿汉式 Example03:汉诺塔问题的求解 Example04 ...

  5. 搭建最小linux系统

    Busybox简介 • 制作文件系统我们需要使用到Busybox 工具 – 版本为busybox-1.21.1.tar.bz2 – 开源网址是http://www.busybox.net/ – Bus ...

  6. golang 后台服务设计精要

    原文地址 守护进程 传统的后台服务一般作为守护进程(daemon)运行.linux 上创建 daemon 的步骤一般如下: 创建子进程,父进程退出: 调用系统调用 setsid() 脱离控制终端: 调 ...

  7. go 网络请求篇二

    框架地址:https://github.com/parnurzeal/gorequest package main //https://antarx.com/2018/05/05/gorequest- ...

  8. vs2013发布网站合并程序是出错(ILmerge.merge:error)

    Vs2013发布网站时,生成错误提示: 合并程序集时出错: ILMerge.Merge: ERROR!!: Duplicate type 'manage_ForcePasswrod' found in ...

  9. 20155209 实验三 敏捷开发与XP实践

    20155209 实验三 敏捷开发与XP实践 实验内容 1. XP基础 2. XP核心实践 3. 相关工具 提交点一: 在IDEA中使用工具(Code->Reformate Code)把下面代码 ...

  10. 20155230 实验三《敏捷开发与XP实践》实验报告

    20155230 实验三<敏捷开发与XP实践>实验报告 一.使用工具(Code->Reformate Code)把代码重新格式化 IDEA里的Code菜单有很多实用的功能可以帮助我们 ...